\documentclass[12pt]{article}
\usepackage{setspace}
%\usepackage[numbers, sort&compress]{natbib}
\usepackage{natbib}
\usepackage{graphicx}
\usepackage{pslatex}
\usepackage{subfig}
\usepackage{float}
\usepackage{hyperref}
\usepackage{rotating}
\usepackage{lineno}
\usepackage{amsmath, amssymb, amsfonts,euscript,mathrsfs}
\usepackage{bm,bbm,latexsym,amsthm}
\usepackage{multirow}
\usepackage{psfig, epsfig, verbatim}
\usepackage[top=1.5in,bottom=1in,left=.5in,right=.5in]{geometry}
\usepackage{comment}
\usepackage{xr}
\externaldocument{Appendix}


\newcommand{\PMTen}{PM$_{10}$ }
\newcommand{\PMTwo}{PM$_{2.5}$ }
\newcommand\independent{\protect\mathpalette{\protect\independenT}{\perp}} \def\independenT#1#2{\mathrel{\rlap{$#1#2$}\mkern2mu{#1#2}}}
\newcommand{\Xbar}{\bar{X}}
\newcommand{\Ozone}{O$_3$ }
\newcommand{\SOTwo}{SO$_2$ }
\newcommand{\NOx}{NO$_x$ }
\newcommand{\COTwo}{CO$_2$ }
\input{commands-book.tex}
\newtheorem{as}{{Assumption}}
\newtheorem{thm}{{\bf Theorem}}
\makeatletter 
\newcommand*{\rom}[1]{\expandafter\@slowromancap\romannumeral #1@} 
\makeatother 

\doublespacing

\setcounter{secnumdepth}{0} % To get rid of numbering the sections but leaving a table of contents

\title{Causal Inference Methods for Estimating Long Term Health Effects of Air Quality Regulations}

\date{}
\author{
Corwin Matthew Zigler\thanks{Harvard TH Chan School of Public Health}
\and Chanmin Kim $^*$
\and Christine Choirat \thanks{Harvard University}
\and John Barrett Hansen$^\dagger$
\and Yun Wang $^*$
\and Lauren Hund \thanks {University of New Mexico}
\and Jonathan Samet\thanks{Keck School of Medicine, University of Southern California}
\and Gary King$^\dagger$
\and  Francesca Dominici$^*$
}

\begin{document}

\maketitle

\newpage
\tableofcontents

\newpage
\section{Abstract}
\textbf{Introduction:} The regulatory environment surrounding air pollution control policies warrants a new type of epidemiological evidence.  Whereas air pollution epidemiology has typically informed policies with estimates of exposure-response relationships between pollution and health outcomes, these estimates alone cannot support current debates surrounding the actual health impacts of air quality regulations.  Directly evaluating specific control strategies is distinct from estimating exposure-response, and increased emphasis on estimating the effectiveness of well-defined regulatory interventions will enhance the evidence supporting policy decisions.  The goal of this report is to provide new analytic perspectives and statistical methods for what we refer to as {\it direct accountability assessment} of the effectiveness of specific air quality regulatory interventions.  Towards this end, we sharpen many of the distinctions surrounding accountability assessment initially raised in \cite{hei_accountability_working_group_assessing_2003} through discussion, development, and deployment of statistical methods for drawing causal inferences from observational data.   The methods and analyses presented here are unified in their focus on anchoring accountability assessment to estimating the causal consequences of well-defined actions or interventions.  We discuss these analytic perspectives in the context of two direct accountability case studies considering different links of the so-called ``Chain of Accountability'' \citep{hei_accountability_working_group_assessing_2003}. 
\\
\textbf{Methods:} The statistical methods described herein consist of both established methods for drawing causal inference with observational data and of newly-developed methodology for causal accountability assessment.  We sharpen the analytic distinctions between studies that directly evaluate the effectiveness of specific policies and those that estimate the exposure-response relationship between pollution and health.  We emphasize how a potential-outcomes paradigm for causal inference can elevate current policy debates with more direct evidence of the extent to which complex regulatory interventions impact pollution and health outcomes.  We outline the potential outcomes perspective and promote its use as a means to frame observational studies as approximate randomized experiments.  Newly-developed methods for causal accountability assessment draw upon propensity scores,  principal stratification, causal mediation analysis, spatial hierarchical models, and Bayesian estimation.  The first of two illustrative case studies uses health outcomes among approximately 4 million Medicare beneficiaries to estimate the causal health impacts of areas being designated as in nonattainment with the 1987 National Ambient Air Quality Standards for \PMTen.  The second case study quantifies the extent to which \SOTwo scrubbers on coal-fired power plants causally affect emissions of \SOTwo, \NOx, and \COTwo, as well as the extent to which causal emissions reductions mediate the causal effect of a scrubber on ambient concentrations of \PMTwo.  Both case studies are anchored to our compilation of national, linked data on ambient air quality monitoring, weather, population demographics, Medicare hospitalization and mortality outcomes, continuous emissions monitoring for electricity-generating units, and a variety of regulatory control interventions.  The resulting database has unprecedented accuracy and granularity for conducting the types of accountability assessment contained in this report, and a key component of our methods development for accountability assessment is the creation of tools for distribution of our linked data base and for reproducible research. \\

\textbf{Results:}   The presentation of the \PMTen nonattainment case study focuses on illustrating the most fundamental features of a causal-inference perspective on direct accountability assessment. Results of this case study indicate that, among areas designated as nonattainment during 1991-1995 the nonattainment designations causally reduced all-cause Medicare mortality and respiratory-related hospitalizations relative to what would have occurred absent the designations.  The power plant emissions case study develops and illustrates our newly-developed statistical methods for multipollutant accountability assessment designed to quantify the causal pathways through which a regulatory action impacts ambient air quality.  Results of the power plant case study indicate that presence of an \SOTwo scrubber causally reduces ambient \PMTwo, that causal reductions in \SOTwo emissions mediate approximately one-third of this effect, and that nearly two-thirds of the causal \PMTwo reduction can be attributed to factors other than causal effects on \SOTwo emissions. \\

\textbf{Conclusion:}   Grounding accountability research in a potential outcomes framework and applying our new methods to our collection of national data sets, provides additional sound evidence of the health effects of long-term and large-scale air quality regulations, adding to that previously reported.  Augmenting the existing body of research with rigorous evidence of the causal effects of well-defined actions will ensure that the highest-level epidemiological evidence continues to support regulatory policies.   Ultimately, our research provides support to the EPA and other stakeholders for incorporating health outcomes research into policy development. \\


\newpage
\section{Introduction}\label{intro}
%%%% Lifted from AJE Commentary %%%%%%%%%
The claim that exposure to ambient air pollution is harmful to human health is hardly controversial in this day and age, due in large part to the evidence amassed through decades of air pollution epidemiological research.  This body of research historically focused on hazard identification and more recently estimation of exposure-response (or, more formally, concentration-response) functions relating how health outcomes differ with spatial and/or temporal variations in ambient pollution exposure \citep{dockery_association_1993, pope_iii_particulate_1996, friedman_impact_2001,  krewski_overview_2003, laden_reduction_2006, zeger_mortality_2008, pope_iii_fine-particulate_2009, correia_effect_2013, chen_evidence_2013}.  Although considerable uncertainty remains with regard to essential finer-grade issues such as the specific shape of the exposure-response functions, the mechanics of exactly \textit{how} pollution harms the human body, and the achievement of an ``adequate margin of safety'' dictated by the US Clean Air Act (CAA), evidence of the exposure-response relationship between pollution and health has motivated a vast array of air quality control policies in the US and abroad.  The collection of these measures has undeniably improved ambient air quality over the past several decades \citep{u.s._epa_integrated_2009, samet_clean_2011}.


Despite the success of such regulatory policies for cleaning the air, an evolving regulatory and political environment is placing new demands on input from the scientific community.  With the prospect of increasing costs resulting from proposed tightening of air quality standards, the evidence motivating these policies is being subject to unprecedented scrutiny, and the scientific community must adapt by providing new types of evidence to support current and future regulatory strategies \citep{samet_clean_2011, dominici_particulate_2014, zigler_point:_2014}.  Policy makers, legislators, industry, and the public increasingly emphasize questions of whether past efforts have actually yielded demonstrable improvements to public health, whether the costs associated with implementation of control policies such as the Clean Air Act (e.g., annual costs of the 1990 Amendments reaching \$65 billion by 2020 \citep{u.s._epa_benefits_2010}) are justified, and which existing strategies have provided the greatest health benefits.  These considerations reflect a shifting demand towards evidence of \textit{effectiveness} of specific regulatory interventions.  Starting most notably with a 2003 report from the Health Effects Institute \citep{hei_accountability_working_group_assessing_2003}, questions of so-called \textit{accountability assessment}  - assessment of the extent to which regulatory actions taken to control air quality impact health outcomes - have been propelled to the forefront of policy debates. A National Research Council report commissioned by the US Congress recommended that an enhanced air quality management system strive to take a more performance-oriented approach by tracking effectiveness of specific control policies and creating accountability for results, with similar calls for the importance of accountability echoed by others, including EPA \citep{national_research_council_air_2004, hidy_technical_2011, hubbell_assessing_2012, u.s._epa_workshop_2013}. Increased emphasis on the direct study of the effectiveness of specific actions is one essential avenue to ensuring that epidemiological research continues to inform air quality control policies amid the current regulatory climate. 

%While the ten-plus years following HEI's initial report has seen an increase in studies framed as accountability (see\cite{van_erp_heis_2009, health_effects_institute_proceedings_2010, van_erp_recent_2012, van_erp_progress_2012}), these studies have been heterogeneous with regard to analytic perspective and specificity of evidence.  Relatively few accountability studies are designed to directly evaluate policies in line with the initial recommendations in \citep{hei_accountability_working_group_assessing_2003}, and consideration of complex long-term interventions of  direct relevance to regulatory policy has been particularly sparse.  

\subsection{Overview of this Report}
The goal of this report is to provide new analytic perspectives and statistical methods for what we refer to as {\it direct accountability assessment} of the effectiveness of specific air quality regulatory interventions.  Towards this end, we sharpen many of the distinctions surrounding accountability assessment initially raised in \cite{hei_accountability_working_group_assessing_2003} through discussion, development, and deployment of statistical methods for drawing causal inferences from observational data.   The methods and analyses presented here are unified in their focus on anchoring accountability assessment to estimating the causal consequences of well-defined actions or interventions.  We discuss these analytic perspectives in the context of two direct accountability case studies considering different links of the so-called ``Chain of Accountability'' \citep{hei_accountability_working_group_assessing_2003}. The statistical methods described herein consist of both established methods for drawing causal inference with observational data and of newly-developed methodology for causal accountability assessment.

\subsection{Case Study 1: \PMTen Nonattainment Designations}
As a result of the 1990 amendments to the CAA, EPA began officially designating US counties as nonattainment for \PMTen if 1) at least one pollution monitor in the county indicated a violation of the 1987 NAAQS for \PMTen during the three previous years or 2) part of the county was thought to contribute to a violation of the NAAQS for \PMTen in another area during that period.  A county nonattainment designation induced the state containing that county to submit a State Implementation Plan (SIP) detailing a strategy to achieve the NAAQS by a target date, and all counties not designated as nonattainment are considered in attainment and not required to produce a SIP.  The \nameref{pm10nonattainment} Section of this report presents an analysis of the extent to which initial \PMTen nonattainment designations causally impacted ambient \PMTen and health outcomes among Medicare beneficiaries.  The presentation of this case study focuses on illustrating the most fundamental features of a causal-inference perspective on direct accountability assessment.  The analysis focuses on three links in the Chain of Accountability: regulatory action, ambient air quality, and human health response.  
	
\subsection{Case Study 2: Scrubber Installations on Coal-Fired Power Plants}
%%%%%%%%%%%% Lifted from barrett's thesis %%%%%%%%%%%%%%%
The 1990 amendments to Title IV of the CAA established the Acid Rain Program (ARP), which instituted a requirement for major emission reductions of both \SOTwo and \NOx from stationary pollution sources.  One goal of this program was to reduce total \SOTwo emissions by ten million tons relative to 1980 levels (29.5 million tons per year). This drop was to be achieved mostly through cutting emissions from electricity-generating units (EGUs). One integral strategy to achieve the emissions-reduction goals of the program, especially among coal-fired EGUs, was the installation of flue-gas desulfurization technology (``scrubbers'') for reducing emissions of \SOTwo.  The \nameref{arpcasestudy} Section of this report presents an analysis of the extent to which  \SOTwo scrubbers on coal-fired power plants causally affected emissions of \SOTwo, \NOx, and \COTwo, as well as ambient concentrations of \PMTwo. The focus of this case study is the illustration of our newly-developed statistical methods for multipollutant accountability assessment designed to quantify the causal pathways through which a regulatory action impacts ambient air quality.  This analysis focuses on three links of the Chain of Accountability: regulatory action, emissions, and ambient air quality.  



\newpage
\section{Specific Aims}
%% Lifted from original application %%%%%%%%
\textbf{A.1: Use a potential outcomes framework to define causal effects of interest for single pollutant accountability assessment and develop methods for estimation (year 1).} We will develop a causal inference method for accountability research that uses principal stratification \citep{frangakis_principal_2002} to isolate the causal pathways leading from regulation to changes in air quality and health.  The proposed method will allow us to quantify and disentangle causal effects of the regulation on health that are 1) associated with causal effects of the regulation on air quality and 2) associated with causal pathways capturing other factors that do not involve changes in air quality (see Figure \ref{pathwaysfig}).\\

\textbf{A.2: Define causal effects for multipollutant accountability assessment and develop methods for estimation (years 1-2).}  Current statistical methods for assessing the consequences of air quality management rely on specification of a single pollutant and on estimation of the health effects of that pollutant.  We propose a method for multipollutant accountability research to estimate the joint effect of a regulation on multiple pollutants, allowing estimation of the (possibly synergistic) downstream effects on health.   \\

\textbf{A.3: Develop national databases, conduct epidemiological studies, and disseminate software and results (all years).}  We will assemble and link national data sets that will provide information on regulatory actions, ambient levels of criteria pollutants, health outcomes, and confounders for the entire U.S..  We will apply our proposed methods to our national data sets to estimate the impact on health indicators of separate regulations that target different pollutants.  The necessary software and computational tools for our methods will be disseminated in conjuncture with results from our epidemiological studies.  \\


\newpage
\section{Methods and Study Design}
\subsection{Publicly-Available Data and Reproducible Research for  Accountability Assessment}\label{datasection}
A key component of our development of methods for accountability assessment has been the creation of data sources and tools for reproducible research.  We have created a national and linked database containing information on ambient air quality monitoring, weather, population demographics, Medicare hospitalization and mortality outcomes, EPA nonattainment designations, continuous emissions monitoring for over 4,000 power-generating units in the US, and emissions control technologies employed at these units.  Information in the data base spans the years 1990 - 2015 and exhibits unprecedented accuracy and granularity for conducting the types of accountability assessment contained in this report. 

Our efforts towards transparency and reproducibility fall in three areas.  First, R packages to implement the newly developed methods are currently in process, and will be made available on the Comprehensive R Archive Network (CRAN).  Second, separate R packages are in development for the purposes of downloading, pre-processing, and linking the data sources described below. With the exception of Medicare data, all data sources used in this report are freely available and downloadable.  The Medicare data are publicly available, but must be purchased.  The programs we are currently developing will allow anyone with R to automatically download and integrate the freely-available data sources for use in their own research.  Finally, we are working to make the specific data sets used for our analyses (with appropriate privacy protections for health data) accessible through the Harvard Dataverse, an online repository for sharing, citing, and preserving research data \citep{king_introduction_2007, crosas_dataverse_2011}.  Distribution of our database and all software programs will permit any given study based on our database and software to be replicated using the appropriate database and software versions. 

\subsubsection{Ambient Monitoring Data\label{sub:Ambient-Monitoring-Data}}


\paragraph*{Air Quality System (AQS) from EPA (\protect\url{http://www.epa.gov/ttn/airs/airsaqs/}).}

We have developed scripts that retrieve daily and annual data at the
monitor level. These data, pre-processed by the EPA from hourly raw
data, contains measurements corresponding to over 10,000 monitors for
the period 1990 to 2014, approximately 5,000 of which are currently
active. Specifically, our scripts allow for obtaining data on: criteria
gases ($\mathrm{O}_{3}$, $\mathrm{SO}_{2}$, CO, and $\mathrm{NO}_{2}$);
particulate matter ($\mathrm{PM}_{2.5}$, $\mathrm{PM}_{2.5}$ non-FRM,
and $\mathrm{PM}_{10}$ Mass, and $\mathrm{PM}_{2.5}$ speciation)
meteorological factors (wind speed, temperature, barometric pressure,
dew point); and toxics (HAPS and VOCs) and lead. 


\subsubsection{Medicare Health Outcomes Data}


\paragraph*{Centers for Medicare and Medicaid Services (CMS) (\protect\url{http://cms.hhs.gov/}).}

Data are available for 1999-2015. These data
provide restricted-access daily data at the zip code level. 
\begin{itemize}
\item \emph{Cohort}: All Medicare enrollees by year of enrollment, including
age, gender, race, state, 5- and 9-digit zip code identifiers for
their residence (40,000,000 people per year).
\item \emph{Mortality}: Date of death for enrollees within the Medicare
cohort.
\item \emph{Hospitalizations}: Hospitalization records for all Medicare
enrollees, including date of hospitalization, length of stay in hospital,
International Classification of Diseases (ICD) primary and secondary
diagnostic and procedure codes associated with the hospitalization,
and the costs billed to Medicare for the hospitalization (see \citet{dominici_fine_2006,peng_coarse_2008,zanobetti_national_2014})
for details).
\end{itemize}

\subsubsection{Regulatory Data: EPA Green Book and Power Plant Emissions Controls Data}
County-level nonattainment designations are available from the EPA Green Book for all criteria pollutants since 1978 (\url{http://www.epa.gov/airquality/greenbook/data_download.html}).  For the period 1995-2012, open-access daily data are available at
the EGU and power plant level from EPA's Air Markets Program Data (AMPD) (\url{http://ampd.epa.gov/ampd/}), where a power plant is defined as a facility with one or more EGUs. We have collected data on 4,164 units belonging to 1,248 facilities, the totality of facilities that participated in the Acid Rain Program. As shown in Figure \ref{fig:Relative-importance}, these monitored facilities are an important source of $\mathrm{SO}_{2}$
emissions: they represent approximately 20\% of all US power plants, but account for 75\% of fuel combustion emissions and 65\% of overall emissions. 

\begin{figure}
\begin{centering}
\includegraphics[scale=0.5]{Figures/hist_prop_arp}
\par\end{centering}

\protect\caption{\label{fig:Relative-importance}Relative importance of $\mathrm{SO}_{2}$
emissions from ARP-monitored facilities.}
\end{figure}


Specifically, we have collected information about each unit (e.g.,
state, county, latitude, longitude); ARP phase (I, II, opt-in, substitution,
compensating); $\mathrm{SO}_{2}$, $\mathrm{NO}_{x}$, and $\mathrm{CO}_{2}$
emissions; average emission NOx rate; heat input, gross load, steam
load, operating time, and status; primary and secondary fuel types
(e.g., coal, diesel oil, natural gas); and scrubber technologies (whether
a scrubber is installed and, if so, the technology it uses) for $\mathrm{SO}_{2}$,
$\mathrm{NO}_{x}$ and particulate matter (PM). 


\subsubsection{Other Supporting Data Sources}
We obtain population demographic information for the year 2000 from the {\bf US Census Bureau} (\url{http://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml}).  We obtain annual temperature data from the {\bf National Climactic Data Center} (\url{http://www.ncdc.noaa.gov/cdo-web/}).  County-level smoking rates are obtained from the CDC Behavioral Risk Factor Surveillance System (\url{http://www.cdc.gov/brfss/data_tools.htm}).  From the {\bf US Energy Information Adiministration (EIA)} (\url{http://www.eia.gov/}) we have retrieved monthly and annual data (from forms EIA-767, EIA-906A and EIA-923) for the period 1985-2012 at the power plant level, including:
the year that each unit was or is expected to be in compliance; strategy
for compliance for PM; actual or projected in-service and retirement
dates; primary fuels and alternate fuel capacity; monthly net electrical
generation; fuel; and monthly heat, ash, and sulfur contents. We have
included EIA information in the AMDP database, relying on the unique
facility and unit identifier that unambiguously corresponds to a power-generating
unit.

\subsubsection{Existing Data Linkage}\label{sub:Existing-Data-Linkage}

\paragraph*{Linking data sources and dealing with spatial misalignment.}

The data sources that we are using are reported for different geographical
units. ARP and AQS data are coordinate-based, whereas Medicare enrollee
data is available at the zip code level. Data linkage strategies have
to take into account spatial misalignment.


\paragraph*{AQS and Medicare.}

In order to link ambient monitoring sites to zip codes, which is the
basis for linking AQS data to Medicare health outcomes, we consider
each zip code in the U.S. and enumerate all EPA monitoring sites
within a certain distance. When a zip code is close to more
than one monitor, that zip code is linked only to the closest monitor.
After each zip code is linked to at most 1 monitoring site, Medicare
data is then obtained at the zip code and aggregated to level of the
monitoring site by combining data on all zip codes assigned to each
monitor. Our process ensures that each zip code is reliably assigned to only one monitoring
site, is computationally efficient, customizable (e.g., could
be used to link zip codes at various distances) and decouples
the zip code-AQS linkage from the combination with Medicare data so
that the linkage can be conducted without any specialized knowledge
of or access to Medicare data.


\paragraph*{AMPD and Medicare and AMPD and AQS.}

Our linkage algorithm can also be used to link AMDP power plants to
AQS monitoring sites or zip codes for our analyses of the Acid Rain
Program.  Our goal is to provide computer programs and source code for
the monitor-zip code linkage that could be re-implemented for any
specified set of monitoring locations (e.g., all $\mathrm{PM}_{10}$
monitoring sites with ``population oriented'' monitors, or all monitoring
sites containing monitors for both $\mathrm{PM}_{10}$ and $\mathrm{O}_{3}$).
Another data process improvement pertains to inclusion/exclusion criteria
for monitoring sites, for example we are able to reliably exclude
monitors that are calibrated on a micro-scale, or those for which
there are not enough daily measurements for a given year.


\subsubsection{The Harvard Dataverse}

Our datasets and databases (linking EPA and EIA sources and simulated Medicare information) are
accessible from a restricted-access Harvard Dataverse
(\url{https://dataverse.harvard.edu/airqualregs}), an online repository
for sharing, citing, and preserving research data (see \citet{king_introduction_2007}).  Upon completion of our database compilation and analyses, access to the Dataverse will become unrestricted.


\subsubsection{AREPA R Package}

We are also developing software programs within the R statistical
environment to allow for easier and broader distribution of our results.
In particular, we are developing the \texttt{arepa} (EPA data retrieving
and processing) package in a private GitHub repository (\url{https://github.com/czigler/arepa}).
Our codebase leverages on the wealth of tools provided by R, more
specifically: (a) fast and efficient in-memory big data manipulation
with \texttt{data.table} \url{http://cran.r-project.org/web/packages/data.table/index.html});
(b) GIS capacities with \texttt{sp} (\url{http://cran.r-project.org/web/packages/sp/index.html}).
The \texttt{arepa} repository will become public and the package will
be made available on the Comprehensive R Archive Network CRAN \url{http://cran.r-project.org/}
alongside completion of the papers.

The \texttt{arepa} package is currently used within our Group and
provides three main groups of functionalities to improve the efficiency
and reproducibility of our workflow.
\begin{enumerate}
\item Script-based downloads for daily and annual AQS data, as described
in \nameref {sub:Ambient-Monitoring-Data} Section.
\item Spatial linkage procedures that implement the methods of the \nameref{sub:Existing-Data-Linkage} Section.  For example, figure \ref{fig:Spatial-linkage-AMPD-PM10}
illustrates the spatial linkage between AMPD power plants and AQS
\PMTwo monitors in 2010 with a default radius of 100 km. Blue circles
correspond to a 100 km radius around a power plant that has been successfully
linked to a unique monitor, while grey circle corresponding to power
plants that have been discarded for lack a unique monitor within the
100 km range.
\item Creation of an indexed dataset from which Medicare data can be retrieved
at the zip code level and then aggregated
around AQS monitors or AMPD power plants.  We will provide simulated Medicare data to illustrate the data formats used in our 
routines so that they can be used by other research groups.
\end{enumerate}
\begin{figure}
\begin{centering}
\includegraphics[scale=0.25]{Figures/AMPD_AQS_2010}\protect\caption{\label{fig:Spatial-linkage-AMPD-PM10}Spatial linkage between AMPD
power plants and AQS \PMTwo monitors in 2010 }
\par\end{centering}
\end{figure}


\subsection{Statistical Perspectives for Causal Accountability Assessment}\label{causalinference}
%%%% Lifted from AJE Commentary %%%%%%%%%
The role of causality is of obvious import for informing policy decisions, and the causal validity (or lack thereof) of epidemiological evidence has always been central to the integration of scientific evidence into policy recommendations \citep{u.s._epa_integrated_2009}.  However, approaches to inferring causality from available observational data can vary depending on the scientific question of interest and the data available for analysis. 

Causal inference in air pollution epidemiology has most commonly been undertaken within a ``classical'' paradigm, which construes causal validity on a continuum according to how likely an observed association (e.g., between pollution and health) can be interpreted as ``causal'' \citep{glass_causal_2013}.  This continuum is explicitly considered in the approach to Integrated Science Assessments conducted by EPA, which classify evidence of the association between pollution exposure and health as a ``causal relationship,'' ``likely to be a causal relationship,'' ``suggestive of a causal relationship,'' ``inadequate to infer a causal relationship,'' or ``not likely to be a causal relationship.''  Even in the absence of the word ``causal,'' the bulk of air pollution epidemiology has been implicitly undertaken with this classical approach; an exposure-response relationship between pollution and health is estimated (e.g., in a cohort study), then a judgment is made as to whether this relationship can be reasonably interpreted as causal, and finally, hypothetical changes in exposure are input into the exposure-response function to infer the resulting ``health effect'' that would be caused by such a change in pollution. Many such studies have been integral to issues of accountability \citep{dockery_association_1993, laden_reduction_2006, zeger_mortality_2008, pope_iii_fine-particulate_2009, correia_effect_2013}. 

As an alternative to the classical paradigm, the potential-outcomes paradigm for causal inference has the distinctive feature that causal effects are explicitly defined as consequences of specific actions \citep{rubin_bayesian_1978}.  Rather than infer causality based on belief of whether an estimated exposure-response relationship can be interpreted as causal, potential-outcomes methods entail definition of a clearly-defined action (a ``cause''), the effects of which are of interest.  Some existing accountability assessments have been (often implicitly) undertaken within a potential-outcomes paradigm for causal inference, the common thread being application of the core tenets of experimentation to observational settings \citep{pope_iii_mortality_2007, moore_ambient_2010, currie_traffic_2011, rich_association_2012, chen_evidence_2013, friedman_impact_2001, hedley_cardiorespiratory_2002, clancy_effect_2002,  tonne_air_2008,  chay_clean_2003, greenstone_did_2004, zigler_estimating_2012, deschenes_defensive_2012}.  

\subsubsection{Direct vs. Indirect Accountability}
Studies framed as accountability studies can be classified according to the specific scientific question of interest.  Studies that answer questions of the form: ``What is the relationship between exposure to pollution and health outcomes?'' can be aptly described as {\it indirect accountability} studies \citep{zigler_point:_2014}.  This type of question has been at the center of air pollution epidemiology for decades, and answers typically come in the form of exposure-response relationships between (changes in) pollution exposure and (changes in) health outcomes.  Importantly, these studies do not consider the effectiveness of any specific regulatory action, but provide valuable evidence for indirectly predicting the impact of policies.  For example, EPA routinely uses exposure-response estimates to estimate the expected benefits of current and future policies; if a policy reduces (or is expected to reduce) pollution by a certain amount, then the exposure-response relationship indirectly implies the health impact of the policy insofar as the relationship can be deemed causal \citep{u.s._epa_integrated_2009, u.s._epa_benefits_2010, u.s._epa_environmental_2012}.  Indirect accountability assessments typically assume that any observed exposure-response relationship would persist amid the complex realities of actual regulatory implementation that will typically impact a variety of factors.  As a consequence, health impacts of regulatory interventions may not be accurately characterized by indirectly applying exposure-response estimates to accountability assessments.

We focus on a slightly different perspective on accountability assessment that we describe as {\it direct accountability} \citep{zigler_point:_2014}.  Direct accountability studies target a different scientific question than studies of exposure-response relationships.  Rather than investigate the relationship between pollution and health, these studies answer the question ``What is the relationship between a specific regulatory intervention and health?''.  These studies are ``direct'' accountability studies in that they directly evaluate the effectiveness of well-defined regulatory actions, which more definitively informs questions as to the actual health benefits of these actions.  While relatively less common to air pollution epidemiology than studies of exposure-response, we argue that direct accountability assessments are best equipped to meet the demands of a shifting regulatory environment wrought with questions surrounding the effectiveness of specific policies.  Of particular importance is the noted lack of direct evaluations of broad, complex regulatory interventions, which are of utmost relevance to policy debates \citep{health_effects_institute_proceedings_2010, van_erp_recent_2012}. % van_erp_progress_2012}

The analytic perspectives and statistical methods described in this report, namely, those rooted in potential-outcomes methods for causal inference, are particularly well suited to answering questions of direct accountability.  The purpose for distinguishing between direct and indirect accountability is not to highlight the need for ``causal'' versus ``associational'' evidence, as all research to provide such evidence shares the goal of establishing causality.  Rather, we argue that the shifting regulatory environment would be better informed by evidence of the effectiveness of specific control policies, and that traditional epidemiological approaches tailored to exposure-response estimation are not the most direct means to provide this evidence.  In an environment that brings skepticism and doubt about results drawn from observational data, analyzing specific interventions with approaches rooted in potential-outcomes thinking can clarify the basis for drawing causal inferences and bring a higher level of credibility to evidence used to support policy decisions \citep{dominici_particulate_2014, zigler_point:_2014}. 




\subsubsection{Potential Outcomes Methods: Framing as a Hypothetical (approximate) Randomized Experiment}
The underlying features of randomized studies that make them the ``gold standard'' for generating causal evidence remain pertinent to causal accountability assessment, with potential-outcomes methods framing observational studies according to how well they can approximate randomized experiments  \citep{rubin_for_2008, hernan_observational_2008}.  The key idea is to define a (possibly hypothetical) experiment consisting of an ``intervention condition'' and a ``control condition'' such that if populations could be randomly assigned to these conditions, differences in observed health outcomes would be interpreted as causal effects of the intervention.  While defining the intervention condition in accountability studies can be  straightforward (e.g., it will likely be a regulatory action that actually occurred), framing accountability as a hypothetical experiment forces the specification of some alternative action that might have otherwise occurred to serve as a relevant control condition.  This exercise formalizes the research question by explicitly defining a causal effect as a comparison between what would happen under well-defined competing conditions.  Hence the name of the potential outcomes paradigm; a causal effect of ``Action A'' relative to ``Action B'' is defined as the comparison of the \textit{potential outcome} if ``Action A'' were taken with the \textit{potential outcome} if ``Action B'' were taken.  Thus, the salient question for accountability is not ``Did health outcomes change after the intervention?'' but rather ``Are health outcomes different after the intervention than they would have been under a specific alternative action?''.   For example, the case study in the \nameref{pm10nonattainment} Section attempts to answer the question: ``Are Medicare health outcomes in \PMTen nonattainment areas different than they would have been if the nonattainment designations had never occurred?''  The hypothetical treatment condition is the set of observed \PMTen nonattainment designations, while the hypothetical control condition is the setting where these areas were never designated.

Of utmost importance is that definition of the causal effect of interest is purely conceptual and conducted \textit{without regard to any assumed statistical model}.  Different models could be used to actually \textit{estimate} this effect, but the effect itself, along with its interpretation, remains consistent regardless of the modeling approach.  This clarity is essential for producing policy-relevant evidence.  Compare this to traditional epidemiological studies, which a) do not necessarily explicate an action defining effects of interest and b) define ``health effects'' with parameters (e.g., regression coefficients) in a statistical model. 

Estimating causal effects with comparisons between potential outcomes under competing intervention and control conditions is met with the fundamental problem that if the intervention is enacted, then outcomes under the control condition are unobserved. For example, evaluating the effect of a \PMTen nonattainment designations requires knowledge of what would have potentially happened if the the nonattainment designations had never occurred.  Hypothetical scenarios that never actually occurred are often referred to as ``counterfactual'' scenarios, and estimating what would have happened under such scenarios is perhaps the most important challenge for direct accountability assessment.  

Counterfactual scenarios have been explicitly considered, for example, in EPA cost-benefit analyses of the CAA mandated by Section 812 of the act, which project two counterfactual pollution scenarios: one that assumes past exposure patterns would have continued without the 1990 CAA Amendments and another that assumes an expected change in exposure patterns under full implementation of the 1990 Amendments.  These projections are coupled with exposure-response functions from the epidemiological literature to project counterfactual health scenarios that form the basis of the health-benefits analyses \citep{u.s._epa_benefits_2010, u.s._epa_environmental_2012}.  However, these counterfactual projections are not validated against studies of actual interventions, and thus are not sufficient for fully characterizing the relationships between regulatory strategies and health \citep{hei_accountability_working_group_assessing_2003}. 

Rather than project counterfactual scenarios by combining assumed exposure patterns with exposure-response estimates, potential-outcomes approaches typically use actual data from the ``control group'' of the hypothetical experiment to learn about what would have happened under the hypothetical control condition, rendering identification of a control population of vital importance.  When assessing the impact of regulatory intervention relative to what would have happened absent the intervention, control populations could be defined based on time (e.g., a population before promulgation of a regulation), or space (e.g., if some areas are subject to an intervention and others not).  Whether outcomes in the control population can actually characterize what would have occurred absent the intervention boils down to the familiar concept of confounding, although what constitutes a confounder of the effect of an intervention is slightly different from the common conception of a confounder as something that is related to both pollution exposure and health.  

For direct accountability, a comparison between outcomes among the intervention and control conditions is unconfounded if the two populations are comparable with regard to factors that relate to outcomes.  An unconfounded comparison of outcomes between the intervention and control conditions yields an estimate of the causal effect. If the two populations differ on important factors related to outcomes, then such a comparison is a convolution of differences due to the intervention and differences due to other factors.  Thus, if an important factor relating to health, for example, smoking behavior, is comparable across the intervention and control populations, then smoking behavior is not a confounder in the assessment of the intervention.  Compare this to the typical setting of exposure-response studies, where a ``confounder'' is generally regarded as a factor that is simultaneously associated with pollution exposure and health outcomes.  In both settings, the definition of a confounder is a factor that is associated with ``exposure'' and ``outcome,'' the key difference being that, in a direct accountability study, the ``exposure'' is actually the intervention.  


\subsubsection{Methods for Confounding Adjustment: Propensity Scores}\label{propensityscores}
There are a variety of analytic tools available to address confounding in nonrandomized accountability studies.  Specialized study designs, often described as ``quasi experiments'' circumvent the need to consider confounding directly, as they support assumptions that an intervention was ``quasi'' randomized in the sense that it is unrelated to health outcomes \citep{greenstone_quasi-experimental_2009, dominici_particulate_2014}.    Absent the availability of such specialized circumstances, methods for confounding adjustment (e.g., matching, weighting, stratification, or standardization) adjust for differences between intervention and control populations so that comparison groups can be regarded as similar on the basis of observed factors, thus mimicking the design of a randomized study.  

One broad class of methods for confounding adjustment relies on the {\it propensity score} \citep{rosenbaum_central_1983, robins_marginal_2000, rubin_for_2008, stuart_matching_2010}.  Propensity score methods share the same objective as, say, adjusting for covariates in a regression model, but have been shown to have several benefits relative to reliance on parametric regression models.  Propensity scores represent a dimension-reduction procedure where multivariate confounding information is condensed into a one-number confounder summary called the estimated propensity score.   Observations with similar values of the estimated propensity score can be regarded as similar on the basis of all of the covariates that were used for its estimation.  The value of the estimated propensity score can then be used to adjust for confounding via matching, weighting, or sub classifying the observed sample based on the estimated propensity score to ensure groups of treated and control observations that are comparable. If the propensity score model is adequately specified (i.e., if there is no unmeasured confounding), then average outcomes in control observations represent what would have happened (on average) in treatment observations with similar values of the propensity score, just as would be the case in a randomized study.  Our analysis in the \nameref{pm10nonattainment} Section makes use of propensity scores for confounding adjustment, and we discuss additional considerations in the context of the analysis.   

\subsubsection{``Causal Pathways'' Analyses: Causal Mediation Analysis and Principal Stratification}\label{causalpathways}
Another objective for causal accountability assessment that is in concert with the Chain of Accountability is to quantify the relative importance of the possible causal pathways that constitute links in the chain.  For example, one set of questions may relate to the extent to which the causal effect of an intervention on health outcomes acts ``through'' reducing ambient pollution, or there may be questions regarding the extent to which an intervention effect on ambient pollution is mediated through specific emissions. Figure \ref{pathwaysfig} presents a schematic representation of different causal pathways for accountability assessment.

Understanding the pathways through which an intervention affects ambient air quality or health outcomes is critical for informing policy decisions.  From this perspective, intermediate factors in the Chain of Accountability that lie between the regulatory action and human health response can be regarded as lying ``on the causal pathway.''  Because such intermediate outcomes are {\it posttreatment concomitant variables} that are expected to simultaneously be affected by the intervention and have bearing on outcomes, standard regression adjustments will not permit estimation of causal effects \citep{rosenbaum_consequences_1984}.  We consider two related causal frameworks for characterizing causal effects with intermediate variables: causal mediation analysis and principal stratification.  

\begin{figure}
\caption{Schematic description of direct and indirect causal pathways for accountability assessment.  Air quality interventions are typically intended to impact primary pollution and health outcomes through reducing specific emissions and/or ambient pollutants (indirect effects) but can, in reality, impact outcomes through other causal pathways (direct effects). }\label{pathwaysfig}
\includegraphics[width=\textwidth, trim=0in 1.25in 0in 2.5in, clip]{Figures/flowchart.pdf} %
\end{figure}

{\it Causal mediation analysis} is a framework designed to isolate specific causal pathways in order to assess whether the causal effect of an intervention on an outcome is mediated through the causal effect of the intervention on the intermediate variable \citep{vanderweele_conceptual_2009}.  Causal mediation analysis construes {\it two} hypothetical interventions in the context of accountability.  First, is the hypothetical intervention representing the ``treatment'' and ``control'' groups described above.  Second, causal mediation analysis also requires definition of an {\it additional} intervention that acts directly on the intermediate variable, independently of the first intervention.  In the example of Case Study 2, the first hypothetical intervention represents the presence of an \SOTwo scrubber on a coal-fired power plant, which is the actual regulatory intervention of interest.  The second intervention is purely hypothetical, and corresponds to a way in which power plant emissions could be manipulated independently of a scrubber.  Definition of potential outcomes based on these two interventions permits decomposition of the total causal effect of an intervention into effects that are {\it direct} and {\it indirect} effects \citep{robins_identifiability_1992}.  A direct effect in the power plant context corresponds to the causal effect of the scrubber that acts directly on ambient pollution in that it is attributable to causal pathways not involving emissions.  An indirect effect in this context is the effect of a scrubber on ambient pollution that can be attributed to causal emissions reductions.  Causal mediation analysis is explained an illustrated with more technical detail in the \nameref{arpcasestudy} Section.  

{\it Principal stratification} is a related approach for causal inference with intermediate variables that shares similar objectives to those of causal mediation analysis, but construes only one hypothetical intervention (i.e., the regulatory intervention being assessed) \citep{frangakis_principal_2002}.  Principal stratification in the accountability context aims to quantify the extent to which causal effects of an intervention on the primary outcome coincide with causal effects of the intervention on the intermediate variable \citep{zigler_estimating_2012}.  For example, Case Study 1 investigates the extent to which a causal effect of \PMTen nonattainment designation on Medicare health outcomes coincides with the causal effect of the designation on ambient pollution. Towards this end, principal stratification defines two types of causal effects: dissociative and associative causal effects.  In the nonattainment example, {\it dissociative effects} quantify the extent to which the intervention causally affects health outcomes when the intervention does not causally affect pollution. Dissociative effects are similar to direct effects in the mediation analysis framework in that they represent causal health effects of an intervention that are indicative of causal pathways other than ambient pollution.  {\it Associative effects} quantify the causal effect of the intervention on health when the intervention causally affects ambient pollution, and are similar to (but distinct from) indirect effects in the causal mediation analysis framework.  An associative effect that is large relative to the dissociative effect indicates that the causal effect of the intervention on health outcomes is greater in areas where pollution was causally affected than in areas where there is little or no effect of the intervention on pollution.  This would suggest the presence of a causal pathway whereby the intervention impacts health through changing pollution.  Dissociative effects that are similar in magnitude to associative effects indicate that the health impact of the intervention is similar regardless of whether the intervention causally impacted ambient pollution, which suggests the presence of other causal pathways through which the intervention affects health without changing pollution.

The theoretical and technical differences between principal stratification and mediation analysis have been closely examined in the causal inference literature (e.g., \cite{vanderweele_simple_2008, joffe_related_2009, pearl_principal_2011, rubin_direct_2004}) in the setting of a single mediating factor.   The \nameref{pm10nonattainment} Section illustrates the use of principal stratification in our analysis of \PMTen nonattainment designations.  The \nameref{arpcasestudy} Section provides technical details of new methods for both principal stratification and causal mediation analysis in the multipollutant context, and interprets both analyses in the context of the case study.

%that it would persist if air pollution were somehow intervened upon so that it remained fixed to a certain value.  That is, direct effects represent causal pathways that do not involve ambient pollution.  An indirect effect in this context corresponds to the causal effect of shifting ambient pollution from what it would be absent the intervention to what it would be under the intervention, while somehow holding the intervention fixed



%As a secondary objective, we also provide evidence regarding the existence of the anticipated causal pathway whereby the nonattainment designations impact health ``through'' reducing \PMTen.   Accordingly, we use principal stratification \citep{frangakis_principal_2002} to quantify the extent to which causal effects of the designations on hospitalization outcomes are 1) \textit{associative} with causal effects of the designations on \PMTen versus 2) \textit{dissociative} with causal effects of the designations on \PMTen.   An associative effect that is large relative to the dissociative effect indicates that the causal effect of the designations on hospitalization outcomes is greater in areas where pollution was causally reduced than in areas where there is little or no effect of the designations on pollution.  Dissociative effects that are similar in magnitude to associative effects indicate that the health impact of the intervention is similar regardless of whether the intervention decreased ambient pollution, which suggests the presence of other causal pathways through which the designations impact health without changing average ambient \PMTen during 1999-2001.  We revisit the interpretation of causal pathways with principal stratification in the Discussion.

\newpage
\section{Results}
\subsection{Case Study 1: Accountability Assessment of \PMTen Nonattainment Designations}\label{pm10nonattainment}
Here we employ the analytical perspectives outlined in the \nameref{causalinference} Section to provide the first direct accountability assessment of one integral regulatory strategy defined under the 1990 amendments to the CAA: the initial designation of areas as nonattainment with the 1987 National Ambient Air Quality Standards (NAAQS) for \PMTen.   In contrast to initial efforts to examine the impact of the CAA as a whole \citep{hei_accountability_working_group_assessing_2003, u.s._epa_benefits_2010}, we provide a focused analysis of the initial \PMTen nonattainment designations for two important reasons.  First, the process whereby the EPA sets NAAQS and initiates nonattainment designations is one integral tool for managing air quality under the CAA, and quantifying the effects of this specific decision process can prove valuable.  Second, a focused characterization of the health impact of a specific set of regulatory decisions provides more targeted direct accountability of the effectiveness of a specific regulatory decision, which yields more targeted evidence for informing future policies than overall assessments of the CAA.  

Our analysis differs from traditional epidemiological investigations of the long-term association between pollution exposure and health in that we adopt an analytical perspective designed specifically to estimate causal effects of a specific set of actions, the nonattainment designations, rather than characterize the exposure-response relationship between pollution and health in the time frame that saw myriad regulatory actions contributing to improved air quality.  Specifically, we use a principled causal-inference framework to assess whether the initial \PMTen nonattainment designations caused improvements in Medicare health outcomes.  In accordance with the Chain of Accountability, we view changes in ambient \PMTen as intermediates on the causal pathway between regulatory decisions and health outcomes, representing three key features of the chain: regulatory action, ambient air quality, and human health response \citep{hei_accountability_working_group_assessing_2003}.  In addition to estimating overall impacts of nonattainment designations on ambient \PMTen and Medicare health outcomes, our approach provides additional information as to the relative importance of different causal pathways through which regulatory decisions may impact health. 

\subsubsection{Linked Data Sources}
We assembled a national, linked database using the tools described in the \nameref{datasection} Section to conduct our investigation.  The study population consists of US Medicare beneficiaries living within 6 miles of a \PMTen monitoring location in 2001.  The locations used for the analysis are those EPA monitoring stations located in the Western US that were in operation at any point between 1990 and 2001 (see Figure \ref{maps}\subref{raw}).  The Western region was chosen because virtually all initial nonattainment designations for \PMTen occurred in this part of the country.  From the EPA Green Book, we enumerated every county in the US designated as nonattainment for \PMTen between 1991 and 1995. Annual average ambient \PMTen concentrations from 1990-2001 were obtained from pollution monitor locations in the EPA AQS database.  Annual average \PMTen concentrations were regarded as missing if the percentage of valid measurements was less than 67\%. Health data was assembled from the Center for Medicare and Medicaid Services (CMS) Medicare Part A hospital claims and enrollment data.  From the CMS enrollment file, we enumerate all Medicare beneficiaries residing in a zip code within 6 miles of a pollution monitor during 2001.  Beneficiaries living within 6 miles of multiple monitors are linked to the monitor closest to their zip code of residence.  Data available on Medicare beneficiaries includes basic demographic information, mortality information, and hospitalization records.  Hospital billing claims data were used to identify hospitalizations for cardiovascular (CVD)-related and respiratory-related illness.  CVD-related hospitalizations were defined as those having ICD9-CM codes 390.xx to 495.xx.  Respiratory-related hospitalizations were defined as those relating to COPD (ICD9-CM 490.xx to 492.xx) or respiratory tract infections (ICD9-CM 464.xx to 466.xx and 480.xx to 487.xx).   Individual-level health data are aggregated to the level of of the monitoring location to yield average demographic information (average age, \% female, etc.) and outcome rates (mortality rate, hospitalization rates) for all beneficiaries living within 6 miles of each monitoring location.

We augment the pollution-health linked data base with county-level information from the 2000 US census and from the Center for Disease Control and Prevention Behavioral Risk Factor Surveillance Survey (CDC-BRFSS).  County-level information includes population demographics, urbanicity, and smoking rates.  Additionally, county-level values of annual average daily maximum temperature in 1990 were obtained by averaging across monitoring stations available from the National Climactic Data Center.  Table \ref{datatab} summarizes the linked information obtained in the data base.  For our analysis, data are considered at the monitor level, that is, for each monitoring location, we have a specific location (latitude and longitude), measures of ambient pollution, demographic characteristics of the county containing the monitor, and aggregated health information on all Medicare beneficiaries residing within a 6 mile radius.  The initial analysis data set contains the 547 monitoring locations depicted in Figure \ref{maps}\subref{raw}, with health data comprised of information on 3,971,610 Medicare beneficiaries.  Among these 547 locations, 268 are located in nonattainment areas, corresponding to 2,349,691 Medicare beneficiaries.  Note that monitoring locations with fewer than 20 Medicare beneficiaries residing within 6 miles in 2001 were excluded. 

The outcome variables for our analysis are: 1) the average annual ambient concentration of \PMTen during 1999-2001; 2) the all-cause mortality rate (\# of deaths per beneficiary); 3) the CVD-related hospitalization rate (\# of hospitalizations per person-year); and 4) the respiratory-related hospitalization rate (\# of hospitalizations per person-year).  Person-years are used in the analysis of the hospitalization outcomes to account for the fact that beneficiaries can die or un-enroll from Medicare during the year, and hospitalization records are only available during the time period of enrollment.  In contrast, mortality is known regardless of Medicare enrollment status.  All other variables listed in Table \ref{datatab} are considered covariates in our analysis.  Note that some covariate values are measured after the nonattainment designations: census variables are from the 2000 census, and Medicare demographic data are measured in 2001.  We assume that such variables are not affected by the nonattainment designations and as such are reliable proxies for the same quantities in the years preceding the nonattainment designations.  

%The only notable exception is the use of average \PMTen during 1990-1994 as a baseline pollution measurement.  This choice was primarily motivated by limited data availability in earlier years, as AQS data are only available back to 1990 and many monitors in our study sample were not in operation in the earliest years of the 1990s. 

Our analysis is confronted with two types of missing ambient pollution data.  First, 284 locations (131 nonattainment) had missing \PMTen measurements in 1990.  Missing values of average annual ambient pollution in 1990 were singly imputed using posterior mean predictions from a spatial hierarchical random effects model as described in Appendix \ref{app:pm10:missing}.  The second type of missing pollution data are missing average annual ambient concentrations during 1999-2001.  Average ambient \PMTen concentration for 1999-2001 was missing for 157 (70 nonattainment) locations. These follow up pollution measures are used as outcomes in our analysis, and are multiply imputed from the models described in the \nameref{pm10statmodels} Section as a byproduct of our Bayesian estimation procedure.  Missing values of county-level average annual daily maximum temperature were imputed for 61 (12 nonattainment) locations using k nearest neighbor mean imputation.   All other covariates and outcomes listed in Table \ref{datatab} were fully observed.

\begin{figure}
\caption{Locations of all 547 \PMTen monitoring locations available for analysis and for the 495 locations retained after propensity score pruning}\label{maps}
\subfloat[Entire Monitor Set]{
\includegraphics[width=.5\textwidth]{Figures/monitormap_raw.pdf}\label{raw}}
\subfloat[Pruned Monitor Set]{
\includegraphics[width=.5\textwidth]{Figures/monitormap_pruned.pdf}\label{pruned}}
\end{figure}


\begin{table}
\caption{Summary statistics for covariates and outcomes available for the analysis of \PMTen nonattainment designations.  Variables marked with $^*$ are those included in the model that estimates the propensity score and for additional covariate adjustment in models for pollution and health outcomes.}\label{datatab}
\begin{tabular}{lcccc} \hline \hline
 & \multicolumn{2}{c}{\underline{Attainment Areas}} & \multicolumn{2}{c}{\underline{Nonattainment Areas}} \\
  & Mean & SD & Mean & SD \\ \hline
  
\underline{Monitor Data} \\
Ambient PM10 1990$^*$&26.36&6.55&39.2&12.75\\ 

 \\ \underline{Medicare Data} \\
Medicare Beneficiaries$^*$&5813.33&9579.5&8767.5&13387.71\\ 
Age$^*$&74.77&1.1&74.7&1.31\\ 
\% Female$^*$&0.55&0.05&0.56&0.06\\ 
\% White$^*$&0.9&0.14&0.87&0.14\\ 
\% Black$^*$&0.01&0.03&0.03&0.05\\ 

 \\ \underline{County-Level Data} \\
Population$^*$&889937.31&1472176.42&3380578.87&5125807.84\\ 
Housing Density$^*$&0.42&0.08&0.4&0.08\\ 
\% Urban Living$^*$&0.72&0.23&0.85&0.19\\ 
Median Income$^*$&42148.87&10415.55&40873.47&7764.82\\ 
\% High School Graduates$^*$&0.84&0.06&0.8&0.09\\ 
5-Year Migration Rate$^*$&0.23&0.06&0.21&0.06\\ 
Smoking Rate$^*$&0.18&0.06&0.2&0.04\\ 
Annual Maximum Temperature$^*$&65.75&6.6&72.39&10.14\\ 
\% Hispanic&0.16&0.14&0.25&0.2\\ 
\% White&0.73&0.18&0.64&0.21\\ 
\% Black&0.02&0.03&0.04&0.03\\ 
\% Female&0.5&0.01&0.5&0.01\\ 

\\ \underline{Pollution and Health Outcomes} \\
Ambient PM10 1999-2001&21.58&6.43&31.56&13.28\\ 
Mortality Rate&62.58&16.95&62.51&12.4\\ 
Hospitalization Rate: CVD&83.74&24.24&92.09&26.65\\ 
Hospitalization Rate: Respiratory&28.39&17.05&28.41&12.78\\ 
\end{tabular}
\end{table}

\subsubsection{Defining the Intervention for Direct Accountability Assessment: Initial \PMTen Nonattainment Designations}
As a result of the 1990 amendments to the CAA, EPA began officially designating US counties as nonattainment for \PMTen if 1) at least one pollution monitor in the county indicated a violation of the 1987 NAAQS for \PMTen during the three previous years or 2) part of the county was thought to contribute to a violation of the NAAQS for \PMTen in another area during that period.  The first nonattainment designations occurred in 1991, and in the Western US considered in our case study, additional areas were designated as nonattainment through 1995 (see Figure \ref{trackpoll}).    A county nonattainment designation induced the state containing that county to submit a State Implementation Plan (SIP) detailing a strategy to achieve the NAAQS by a target date, and all counties not designated as nonattainment are considered in attainment and not required to produce a SIP.  The initial target date for nonattainment areas to achieve the NAAQS was the end of 1994, but subsequent determination that some areas could not realistically attain the NAAQS in this time frame prompted the EPA to designate some areas as ``serious'' nonattainment and extend the target date to the end of 2001.  For the purposes of our analysis, we consider the ``initial'' nonattainment designations to consist of any such designation that occurred between the years 1991-1995, as the distinguishing features between areas designated in 1991 and those in the next few years are thought to relate more to procedural issues and availability of data, not to air quality per se.


\subsubsection{Potential Outcomes Approach and Causal Effects of Interest}
The overall goal is to estimate causal effects of the initial nonattainment designations on Medicare outcomes in 2001.  Importantly, the salient question is not ``Did air pollution and health outcomes change during the time following the nonattainment designations?", but rather ``Are air quality and health outcomes different after the nonattainment designations than they would have been if the designations had not occurred?".  

More formally, we define the causal effect of interest as the comparison between two sets of potential outcomes: those that would occur if areas were designated as nonattainment for \PMTen in 1991-1995 and those that would occur if the nonattainment designations had not occurred.  Note that we forego the use of potential outcomes notation in the analysis of this case study (more formal use of potential outcomes notation appears in the \nameref{arpcasestudy} Section).  We consider the comparison of potential outcomes only among the locations that were actually designated as nonattainment, that is, the estimand of interest is what is known in the causal inference literature as the average treatment effect on the treated (ATT).  Thus, we refine the causal question slightly to be ``Among areas that were designated as nonattainment in 1991-1995, what were the causal effects of these designations?''

As a secondary objective, we also aim to characterize the existence of the anticipated causal pathway whereby the nonattainment designations impact health ``through'' reducing \PMTen in 1999-2001.  Because \PMTen in this context is an intermediate outcome that is expected to simultaneously be affected by the nonattainment designations and have bearing on the Medicare health outcomes, standard regression adjustment for ambient \PMTen in 1999-2001 will not permit estimation of causal effects \citep{rosenbaum_consequences_1984}.  Accordingly, we use principal stratification \citep{frangakis_principal_2002} to quantify the extent to which causal effects of the designations health outcomes are 1) \textit{associative} with causal effects of the designations on \PMTen versus 2) \textit{dissociative} with causal effects of the designations on \PMTen (see the \nameref{causalpathways} Section).   An associative effect that is large relative to the dissociative effect indicates that the causal effect of the designations on Medicare health outcomes is greater in areas where pollution was causally reduced than in areas where there is little or no effect of the designations on pollution.  Dissociative effects that are similar in magnitude to associative effects indicate that the health impact of the intervention is similar regardless of whether the intervention decreased ambient pollution, which suggests the presence of other causal pathways through which the designations impact health without changing average ambient \PMTen during 1999-2001. Recall that causal reductions in pollution are defined based on whether pollution was lower with the nonattainment designation than it would have been without the designation, not based on whether pollution decreased across time.

As outlined in the \nameref{causalinference} Section, these types of causal questions can be framed as a hypothetical two-armed experiment with one ``intervention'' arm corresponding to the observed allocation of nonattainment designations and the other ``control'' arm corresponding to the hypothetical scenario with no nonattainment designations.  Potential outcomes in nonattainment areas are observed under the intervention condition representing the actual nonattainment designations.  Since potential outcomes are not observed under the hypothetical control condition without nonattainment designations, we use observed pollution and health outcomes in attainment areas that were not subject to SIP measures to characterize what would have happened in nonattainment areas had the nonattainment designations not occurred.  Thus, the attainment areas can be construed as a ``control group'' for studying the effect of the nonattainment designations. The obvious threat to validity of estimating causal effects of the nonattainment designations by comparing outcomes with attainment areas is that the designations are decidedly not randomly assigned, meaning that attainment areas may share important differences with nonattainment areas, which is evident from Table \ref{datatab}.  Using data on attainment areas to learn about what would have happened in nonattainment areas requires careful confounding adjustment.


\subsubsection{Estimation of Propensity Scores for Confounding Adjustment}
The \textit{propensity score} \citep{rosenbaum_central_1983, stuart_matching_2010} is a nearly ubiquitous tool for adjusting for confounding to estimate causal effects with observational data that do not enjoy the benefits of randomization.  The motivation for the propensity score in our setting is to construct groups of attainment and nonattainment locations that are comparable with respect to the covariates listed in Table \ref{datatab} so that comparing outcomes in attainment and nonattainment areas does not suffer from confounding on the basis of these factors (see the \nameref{propensityscores} Section).  The key assumption we adopt for confounding adjustment is that of {\it strong ignorability} that the covariates listed in Table \ref{datatab} constitute (or are proxies for) all factors that could confound comparisons between attainment and nonattainment areas.  This amounts to the familiar ``no unmeasured confounding'' assumption, and we revisit this assumption in the \nameref{pm10discussion} Section.  

We use a logistic regression for the probability of a nonattainment designation to estimate the propensity score, with predicted probabilities from this model representing the estimated propensity scores.  Covariates included in this propensity score model are those noted in Table \ref{datatab}.  If the propensity score model is adequately specified and the factors in Table \ref{datatab} comprise all of the relevant confounders (i.e., if there is no unmeasured confounding), then average outcomes in attainment areas represent what would have happened (on average) in nonattainment areas with similar values of the propensity score, just as would be the case in a randomized study. 

Figure \ref{pshists}\subref{raw} depicts the distribution of estimated propensity scores for all 547 attainment and nonattainment locations.  As expected, locations designated as nonattainment tend to have higher estimated propensity scores, but note that a wide range of estimated propensity scores are represented by both attainment and nonattainment areas.  However, note in Figure \ref{pshists}\subref{raw} that locations with estimated propensity scores greater than 0.98 are exclusively nonattainment locations, with no attainment location having estimated propensity score in this range.  This phenomenon, where areas of the estimated propensity score distribution only have representation from either the treated or the control group, is sometimes referred to as a lack of ``overlap'' between propensity score distributions \citep{crump_dealing_2009}.  Areas of the propensity score distribution that exhibit lack of overlap are not appropriate for making causal inferences, and observations lying in non-overlapping portions of the propensity score distribution should be removed or ``pruned'' from the analysis data set to prevent model-based extrapolation beyond the range of observed data \citep{ho_matching_2007, crump_dealing_2009}.  

In this instance, 49 nonattainment locations had estimated propensity scores greater than 0.98, that is, these observations did not ``overlap'' with estimated propensity scores from attainment areas.  The implication of this lack of overlap is that each of these 49 nonattainment areas exhibits a constellation of the covariates in Table \ref{datatab} that does not resemble that of \textit{any} attainment area, leaving no observed information to learn about what would have happened in these areas had they not been designated. Put another way, these observations lack appropriate ``control'' observations.  Analogously, 3 attainment areas had estimated propensity scores that did not overlap with those estimated in nonattainment areas. This lack of overlap is not surprising, as we would expect that, for example, there are areas of California's Central Valley having population demographics and  pollution levels that do not resemble those of \textit{any} other part of the country.  For estimation of causal effects that do not rely on model-based extrapolation of the confounding adjustment, we discard the 52 observations without overlapping propensity scores to yield an pruned analysis sample that includes 495 monitoring locations, 219 of which lie in nonattainment areas \citep{king_dangers_2006, ho_matching_2007}.  This reduces the study population to 3,555,934 Medicare beneficiaries.  See Figure \ref{maps}\subref{pruned} for the locations of the remaining monitoring locations.  Appendix \ref{app:pm10:discard} displays the locations of the discarded observations.  Note that, strictly speaking, discarding such observations means that estimates from our analysis cannot technically be regarded as ATTs, as subsequent estimates only pertain to the subset of retained nonattainment lcoations.

After pruning the sample as described above, confounding adjustment is accomplished by grouping locations together based on the values of the estimated propensity score, as locations with similar values of the propensity score can be regarded as similar on the basis of all observed confounders. We classify the pruned analysis sample into 5 subgroups based on the quintiles of the estimated propensity score, each subgroup containing attainment and nonattainment locations that have similar values of the propensity score (i.e., are comparable with regard to the factors in Table \ref{datatab}).  Table \ref{psgrps} lists the number of attainment and nonattainment areas in each propensity score subgroup.  Conditional on estimates of the propensity score, any models for pollution and health outcomes can be used to estimate causal effects in a manner that is much less susceptible to observed confounding \citep{ho_matching_2007}.  



\subsubsection{Checking Covariate Balance}
One essential benefit of propensity scores is the possibility to check the extent to which grouping observations based on estimated propensity scores ``works'' in the sense that it ensures the construction of groups of attainment and nonattainment locations that are in fact comparable on the bases of the factors in Table \ref{datatab}.  If covariates are balanced between attainment and nonattainment areas within propensity score subclass, the potential for these covariates to confound the analysis of causal effects is greatly reduced.  One common metric for checking covariate balance is the standardized difference between attainment and nonattainment observations \citep{stuart_matching_2010}. This quantity can be calculated for each covariate as a way to summarize whether a covariate is in fact balanced between attainment and nonattainment areas, with values closer to zero indicating better average balance (and less susceptibility to confounding).  We calculated standardized differences for each covariate before employing the propensity score, among all 547 monitoring locations in the original data set, and depict these values with the red line in Figure \ref{balcheck}.  These standardized differences in the unadjusted sample can be regarded as a measure of the potential for bias in causal effect estimates due to differences between attainment and nonattainment locations.  We also calculate the standardized difference for each covariate within propensity score subclass as a measure of the covariate similarity among attainment and nonattainment locations with similar values of the propensity score. The blue line of Figure \ref{balcheck} plots, for each covariate, the average of this value across the 5 propensity score subclasses.  The conclusion from Figure \ref{balcheck} is that, despite the stark differences between attainment and nonattainment locations in the entire sample (red line), the propensity score does an adequate job of balancing the covariates in Table \ref{datatab} between attainment and nonattainment locations, within propensity score subclass (blue line).  

\begin{figure}
\caption{Histograms of estimated propensity scores for attainment and nonattainment areas before and after pruning observations with non-overlapping propensity score estimates}\label{pshists}
\subfloat[Full Monitor Set, 547 locations]{
\includegraphics[width=.5\textwidth]{Figures/pshists_raw.pdf}\label{raw}}
\subfloat[Pruned Monitor Set, 495 locations]{
\includegraphics[width=.5\textwidth]{Figures/pshists_pruned.pdf}\label{pruned}}
\end{figure}

\begin{table}
\centering
\caption{Number of attainment and nonattainment areas in each of the five propensity score subclasses used for confounding adjustment}\label{psgrps}
\begin{tabular}{lccccc} \hline \hline
 & \multicolumn{5}{c}{\underline{Propensity Score Quintile}}  \\
 & 1$^{st}$ & 2$^{nd}$ & 3$^{rd}$ & $4^{th}$ & $5^{th}$ \\ \hline
Attainment & 87 & 78 & 71 & 33 & 7 \\
Nonattainment & 12 & 21 & 28 & 66 & 92 \\ \hline
Total & 99 & 99 & 99 & 99 & 99
\end{tabular}
\end{table}

\begin{figure}
\centering
\caption{Description of covariate balance before (red line) and after (blue line) propensity score subclassifcation, as summarized by average standardized differences between nonattainment and attainment areas across each available covariate.}\label{balcheck}
\includegraphics[width=.5\textwidth]{Figures/balplot.pdf}
\end{figure}

\subsubsection{Models for Estimating Causal Effects}\label{pm10statmodels}
Note that none of the preceding discussion of propensity scores involves Medicare health outcomes, nor does it pertain to any particular statistical model for actually estimating causal effects.  Rather, we have only formalized the causal effects of interest (ATTs), formulated the relevant ``treatment'' and ``control'' groups, and employed propensity scores to construct an analysis data set that will serve as the basis for estimating causal effects.  Conditional on estimates of the propensity score, we used parametric models for potential outcomes under attainment and nonattainment designations to predict potential outcomes that are not observed, namely, the potential pollution and health outcomes that would have occurred in nonattainment areas had the designations never occurred.  Insofar as these predictions can be regarded as an accurate reflection of what would have happened absent the designations, they can be used to estimate causal effects.  Our analysis relies on two such models: 1) a spatial hierarchical regression model for (log-transformed) ambient \PMTen concentrations during 1999-2001 and 2) log-linear Poisson regression models for each Medicare mortality and hospitalization outcomes.  All regression models adjust for propensity score subclasses and also for individual variables from Table \ref{datatab} to adjust for any residual confounding not accommodated by the propensity score model and to improve efficiency.  See \cite{zigler_estimating_2012} and Appendix \ref{app:pm10} for details of these model specifications, Markov chain Monte Carlo (MCMC) procedure used for estimation, and further technical detail.


\subsection{Case Study 1 Results}
Figure \ref{trackpoll} depicts the annual average ambient \PMTen for each of the 547 monitors for the years 1990 - 2001.  As expected, ambient \PMTen in the early 1990s tended to be higher in nonattainment areas, but both attainment and nonattainment areas had ambient \PMTen below the NAAQS during this time frame.  Also note that ambient average \PMTen is decreasing similarly in both attainment and nonattainment areas.

\subsubsection{Unadjusted Comparisons}
Using data on the entire sample of 547 monitoring locations indicates that, between the baseline and follow-up time periods, average ambient \PMTen reduced by 8.8 $\mu g/m^3$ in nonattainment areas, from 40.4 in 1990 to 31.6 in 1999-2001.  The analogous decrease in attainment areas was smaller:  5.4 $\mu g/m^3$, from 27.0 in 1990 to 21.6 in 1999-2001. The $p$-value comparing these changes from a two-sample $t$-test was $p < 0.001$.  Among Medicare beneficiaries residing near one of the 547 monitors, the average rate of all-cause mortality (per 1000 person-years) in 2001 was similar in nonattainment and attainment areas (62.5 vs. 62.6, p-value from 2 sample t-test $=0.952$).  The average rate of CVD-related hospitalizations (per 1000 person-years) in 2001 was higher in nonattainment areas (92.1 vs. 83.7, p-value from 2 sample t-test $< .001$). Average rates of respiratory-related hospitalizations were similar in nonattainment areas and attainment areas (28.4 vs. 28.4, $p=0.991$).  Note that these unadjusted comparisons are likely confounded due to differences between attainment and nonattainment areas. 

% A simple regression model for (log-transformed) ambient \PMTen during 1999-2001 that adjusts for nonattainment status and the variables listed in Table \ref{datatab} also indicates that \PMTen decreased similarly in the two groups, with a mean difference in change from baseline to follow up of $0.33 (p=0.97)$.  
%THERE WAS A DIFFERENCE WHEN USING DATA FROM 1987-1989: MIGHT WANT TO GO BACK TO USING 1990, BECAUSE IT APPEARS THAT A LOT OF THE ACTION WAS REALLY IN THE EARLY 1990s.


%Simple log-linear poisson regression models for these health outcomes that adjust for nonattainment status and the variables listed in Table \ref{datatab} indicates adjusted mean differences in outcome rates of $-0.112 (p=0.74)$, $-3.02 (p<0.001)$, and $-2.63 (p<0.001)$ for mortality rate, CVD-related hospitalization rate, and respiratory-related hospitalization rate, respectively.  

\begin{figure}
\caption{Trends in annual average ambient \PMTen: 1990 - 2001.  Thin lines represent individual monitoring locations, thick lines represent the average across all locations. The number listed for each year is the total number of nonattainment areas in that year.}\label{trackpoll}
\includegraphics[width=.8\textwidth]{Figures/trackpoll1990_2001.pdf}
\end{figure}


\subsubsection{Average Causal Effects on Average Annual Ambient \PMTen during 1999-2001}
Using the propensity score approach outlined above and confining interest to the ATT among the 219 nonattainment areas in the pruned sample, we estimate the causal effect of the nonattainment designations on average ambient \PMTen during 1999-2001 using the spatial hierarchical model outlined in Appendix \ref{app:pm10:models}, adjusted for the propensity score subclass and the variables in Table \ref{datatab}.  The estimated causal effect of the nonattainment designations on average ambient \PMTen during 1999-2001 is -1.17 $\mu g/m^3$, with 95\% posterior interval (-7.33, 4.00).  This indicates that, among these 219 areas, average ambient \PMTen during 1999-2001 was slightly lower than it would have been if the nonattainment designations had not occurred, that is, there is some evidence that the nonattainment designations had a causal effect on 3-year average ambient \PMTen during 1999-2001.  This highlights the likely possibility that decreases in this measure of \PMTen during this time frame (evident from Figure \ref{trackpoll}) are likely due in part to factors affecting both attainment and nonattainment areas. 

\subsubsection{Average Causal Effects on Medicare Health Outcomes}
For the Medicare health outcomes, we used the propensity score approach outlined above and the models outlined in \cite{zigler_estimating_2012} and Appendix \ref{app:pm10:models} to estimate ATTs among the 219 nonattainment locations in the pruned sample.  Models used for predicting potential outcomes adjust for propensity score subclass and the variables indicated in Table \ref{datatab}.  Figure \ref{cehplot} summarizes posterior distributions of the average causal effects of the nonattainment designations on Medicare mortality, CVD-related hospitalization, and respiratory hospitalization rates (per 1000 beneficiaries).  For all-cause mortality, the posterior mean ATT was -1.08 deaths per 1000 Medicare beneficiaries, with 95\% posterior interval (-3.27, 0.99), suggesting that the nonattainment designations caused a decrease in mortality (i.e., that the average mortality rate in nonattainment areas was 1.08/1000 lower in 2001 than it would have been had these areas not been designated nonattainment).  For CVD hospitalizations, the posterior mean ATT was 1.44 hospitalizations per 1000 person-years, with 95\% posterior interval (-4.64, 7.16), which does not indicate any causal effect on CVD-related hospitalizations among Medicare beneficiaries relative to what would have occurred without the designations.  For respiratory hospitalizations, the posterior mean (95\% interval) average causal effect of the nonattainment designations was -1.47 (-3.86, 0.70) hospitalizations per 1000 person years, indicating that the nonattainment designations causally reduced respiratory-related hospitalizations, relative to what would have occurred without the designations.  Note that all of the ATT estimates had 95\% uncertainty intervals that include 0, and that the analysis of CVD-related hospitalizations exhibits the most uncertainty.


\subsubsection{Associative and Dissociative Effects}
To provide some insight into the existence of the anticipated causal pathway whereby the nonattainment designations decrease \PMTen which, in turn, causes improvement in Medicare health outcomes, we use the same models described above (conditional on the propensity score and covariates in Table \ref{datatab}) to estimate associative and dissociative effects.  Dissociative effects in this context are the causal effects of nonattainment designations on health outcomes among locations that are estimated to have experienced little or no causal effect on ambient \PMTen during 1999-2001, i.e., where the estimated causal effect on this measure of \PMTen is less than 5 $\mu g/m^3$.  Associative effects in this context are the causal effects of nonattainment designations on health outcomes among locations where ambient \PMTen during 1999-2001 is estimated to have decreased substantially as a result of the designation, i.e., where the estimated causal effect on this measure of \PMTen is a decrease of at least 5 $\mu g / m^3$.  

For the mortality outcome, the posterior mean (95\% interval) average dissociative effect is -1.90 (-5.52,1.87) deaths per 1000 beneficiaries, providing some evidence that mortality was reduced even in areas where \PMTen during 1999-2001 was not causally affected.  The posterior mean (95\% interval) associative effect is -0.46 (-4.03, 2.64), indicating no evidence of a causal effect on mortality in locations where \PMTen is estimated to have been causally decreased by more than 5 $\mu g /m^3$. 

For the CVD-related hospitalization outcome, the posterior mean (95\% interval) average dissociative effect is 2.83 (-5.84, 11.01) hospitalizations per 1000 person-years, providing no evidence that CVD hospitalizations were causally affected in areas where \PMTen during 1999-2001 was not causally affected, with a point estimate suggestive of an increase in CVD hospitalizations in these areas.  The posterior mean (95\% interval) associative effect is -3.78 (-11.69, 3.79), which provides little evidence of a causal reduction in CVD hospitalization in locations where \PMTen is estimated to have been causally decreased by more than 5 $\mu g /m^3$.  Note the wide uncertainty intervals for both dissociative and associative effects for CVD-related hospitalizations. 

For the respiratory-related hospitalization outcome, the posterior mean (95\% interval) average dissociative effect is -0.31 (-3.84, 3.18), hospitalizations per 1000 person-years, providing no evidence that respiratory hospitalizations were causally affected in areas where \PMTen during 1999-2001 was not causally affected.  In contrast, the posterior mean (95\% interval) associative effect is -3.34 (-7.43, 0.67), indicating a causal reduction in respiratory hospitalizations in locations where \PMTen is estimated to have been causally decreased by more than 5 $\mu g /m^3$.  Among the three outcomes, only the analysis of respiratory hospitalizations indicates an associative effect that is larger in magnitude than the dissociative effect, which is suggestive of the anticipated causal pathway; respiratory hospitalizations are not estimated to have been affected in areas where \PMTen was not substantially affected, and these hospitalizations are estimated to have been causally reduced in areas where \PMTen was causally reduced relative to what would have occurred without the nonattainment designations.



\begin{table}
\caption{Causal Effect Estimates for overall, associative, and dissociative effects in the analysis of \PMTen nonattainment designations.}\label{cetab}
\small
\begin{tabular}{lccccccccc} \hline \hline
 & \multicolumn{3}{c}{\underline{Overal Average Causal Effect}} & \multicolumn{3}{c}{\underline{Average Dissociative Effect}}  & \multicolumn{3}{c}{\underline{Average Associative Effect}} \\
  & Mean & 2.5\% & 97.5\%& Mean & 2.5\% & 97.5\%& Mean & 2.5\% & 97.5\% \\ \hline
Ambient PM10&-1.17&-7.33&4&&&&&&\\ 
All-Cause Mortality&-1.08&-3.27&0.99&-1.9&-5.52&1.87&-0.46&-4.03&2.64\\ 
CVD Hospitalization&1.44&-4.64&7.16&2.83&-5.84&11.01&-3.78&-11.69&3.79\\ 
Respiratory Hospitalization&-1.47&-3.86&0.7&-0.31&-3.84&3.18&-3.34&-7.43&0.67\\ 
\end{tabular}
\end{table}

\begin{figure}
\center
\caption{Posterior mean point estimates and 95\% posterior probability intervals for overall, associative, and dissociative effects in the analysis of \PMTen nonattainment designations.}\label{cehplot}
\includegraphics[width=.5\textwidth]{Figures/cehline_big.pdf}
\end{figure}



\subsection{Conclusion and Discussion of Case Study 1}\label{pm10discussion}
We have employed the principles of the causal inference perspective described in the \nameref{causalinference} Section to provide the first direct accountability assessment of one key feature of US air pollution regulatory policy: the initial \PMTen nonattainment designations that followed from the 1990 CAA Amendments.  Using a potential outcomes perspective, we explicitly define and estimate the causal effects of this specific set of regulatory decisions.  Although ambient \PMTen decreased in both attainment and nonattainment areas during the time frame of study, our results provide some evidence that 3-year average ambient \PMTen during 1999-2001 among areas designated as nonattainment in 1991-1995 was lower due to the nonattainment designations than it would have been if the designations had never occurred.  However, the small magnitude of this effect and associated uncertainty highlights the likely possibility that either the reductions in ambient \PMTen observed between the early 1990s and early 2000s are attributable to other control measures that exist outside of the paradigm of nonattainment designations, or simply that the average ambient concentration during 1999-2001 is not an appropriate measure to capture the impact of the nonattainment designations.  Furthermore, many areas likely started to take action to reduce air quality before 1990 in anticipation of the impending designations, which is not reflected in our analysis that relies on data dating back only to 1990.  Despite the modest effect on average ambient \PMTen during 1999-2001, our results provide evidence that the nonattainment designations causally reduced  mortality and respiratory-related hospitalizations among Medicare beneficiaries residing near a monitor located in a nonattainment area, as compared to what would have occurred if the nonattainment designations had not taken place.  

Results from our investigation of the presumed causal pathway whereby nonattainment designations improve health outcomes through reducing ambient \PMTen differed depending on the outcome of interest.  The principal stratification analysis of the respiratory-related hospitalization outcome indicated an associative effect that was much larger in magnitude than the dissociative effect.  The estimated dissociative effect near zero indicates that the nonattainment designations did not cause reductions in hospitalizations among areas where \PMTen was not substantially causally affected.  The estimated associative effect that is different from zero indicates a causal reduction in respiratory hospitalizations in areas where \PMTen is estimated to have been causally reduced by more than 5 $\mu g /m^3$.  Knowledge that respiratory hospitalizations were causally reduced only when \PMTen was also causally reduced (large associative effect relative to dissociative effect) suggests the anticipated causal pathway, although this principal stratification analysis can not conclusively indicate the improvement in health outcomes is due to the causal reduction in \PMTen, as would be the case in a formal mediation analysis \citep{vanderweele_marginal_2009}. In particular, our analysis cannot rule out the possibility that the correspondence between effects on health and effects on pollution is driven by a factor other than the nonattainment designations, nor can our analysis discern whether health effects are due to an alternative causal pathway present within the areas where ambient \PMTen during 1999-2001 was causally reduced (see the \nameref{arpcasestudy} Section for a formal causal mediation analysis).   Nonetheless, we argue that knowledge that the causal effect of \PMTen nonattainment designations on respiratory hospitalization outcomes is most pronounced in areas exhibiting causal reductions in ambient \PMTen represents a result that is useful for informing future policies.

The principal stratification analysis of the CVD-related hospitalization outcome showed a similar pattern, with causal effects on this outcome most strongly pronounced in areas where the nonattainment designations decreased \PMTen, however all estimates for this outcome were subject to substantial uncertainty.  The principal stratification analysis of the mortality outcome provided no evidence that dissociative effects were different from associative effects, which suggests that any evident causal effects of nonattainment designations on mortality are likely due to causal pathways other than the impact on 3-year average ambient \PMTen in 1999-2001.  Examples of other important causal pathways include simultaneous or synergistic impacts on other pollutants, the initiation of economic consequences that affect health outcomes in the long term, or even other measures of \PMTen, possibly in different time frame \citep{hei_accountability_working_group_assessing_2003}. 


%Other likely causal pathways include average ambient \PMTen during a different time frame, alternative measures of \PMTen (e.g., the number of days with ambient levels above a certain threshold), economic changes initiated by the nonattainment designations, or impact of the nonattainment designations on other pollutants. 

% Importantly, the prospect of multipollutant causal pathways is particularly salient in this context, as actions taken to control \PMTen frequently affect pollution sources that emit several pollutants that could impact health outcomes.  Extensions of our framework have been considered in the multipollutant context \citep{zigler_estimating_2012}.

As with virtually all air pollution epidemiology, one significant challenge to causal inference is the prospect of confounding.  In the accountability context considered here, confounders are factors that differ between attainment and nonattainment areas that also bear some relationship with pollution and/or health outcomes.  Note that this is slightly different from studies of exposure-response relationships, where confounders are generally considered to be factors jointly associated with pollution and health outcomes.  Observed confounding in our context is particularly pronounced, as nonattainment areas were designated precisely because they exhibited (or contributed to) poor air quality, which is associated with a multitude of factors that differentiate attainment and nonattainment areas.  Our propensity score strategy was able to group together attainment and nonattainment locations that were similar on the basis of baseline pollution levels, characteristics of the Medicare beneficiaries residing within 6 miles, and numerous features of the general nearby population, thus minimizing the chance of confounding with regard to these factors.  Importantly, we discarded monitor locations in nonattainment areas that were not similar to \textit{any} attainment area on the basis of observed confounders (and vice versa).  Including these locations in our analysis would estimate causal effects that necessarily rely on model extrapolation beyond the information contained in the observed data, while removing them restricts our conclusions to only a subset of nonattainment areas. While our strategy is specifically designed to mitigate biased due to measured confounders, the prospect of unmeasured confounding remains a threat to the validity of our results.  If there exist unmeasured factors related to pollution and/or health outcomes that, even after adjustment for all observed factors in Table \ref{datatab}, still differ between attainment and nonattainmnent areas, then our results are subject to unmeasured confounding.  

One consequence of framing our approach in a formal potential-outcomes framework is the need to precisely define a specific intervention of interest, here, the nonattainment designations.  While we argue that defining and estimating causal effects of the nonattainment designations is of considerable value, it should be noted that the designations resulted in a wide variety of specific actions to control air quality on state and local levels, and sometimes resulted in no action at all, for example, when regional control strategies were expected to reduce \PMTen to achieve the NAAQS regardless of the nonattainment designations.  This diversity of action (or inaction) is one likely explanation for the fact that annual ambient \PMTen decreased similarly in both nonattainment and attainment areas.  In this sense, estimating the causal effects of nonattainment designations is akin to an ``intention to treat'' analysis in clinical studies that consider causal effects of assignment to an intervention, as opposed to actual receipt of that intervention.  Available data on specific actions taken on local scales (e.g., measures in a SIP) could facilitate a causal analysis of these actions, but the use for informing future policy could be limited because specific control actions are often highly specialized to local circumstances (e.g., spraying wind blown dust in Central California) and may not be replicable or relevant in other areas or at future time points.  In focusing on the causal effects of a set of federal-level regulatory decisions, we exchange some level of detail with regard to actual control measures for precision in definition of the intervention and evidence of the effectiveness of a regulatory process that can (and will) be replicated in the future.  

A related limitation of our analysis is that it does not explicitly account for the regional nature of air quality control.  Due to issues such as regional pollution transport, actions undertaken in SIPs in particular areas may have impacts that spread across to other areas.  The prospect of actions in nonattainment areas having effects that spill over into nearby attainment areas would dilute the causal effects we wish to estimate; pollution and health outcomes in attainment areas are likely improved as well, and the present analysis does not account for this improvement.  Thus, pollution and health outcomes under a setting where nonattainment designations had never occurred may have actually been worse than was indicated in the ``control group'' of our analysis that may have experienced benefits of the designations.  This phenomenon is known in the causal inference literature as {\it interference}.  Our work in \cite{zigler_estimating_2012} outlines an assumption about interference under which our approach would be robust to the effects of regional pollution transport. The assumption relies on the feature that nonattainment designations implicitly considered regional transport to some extent in that a particular area's designation could be based on contributions to air quality in other areas.

%This is in contrast to an alternative analysis that considers collections of actions taken on state or local levels, which would not parse features of the regulatory environment that, if deemed successful, could be incorporated into future policymaking.  

Our analysis makes use of a vast data resource that links together information on regulatory actions, ambient air quality, population characteristics, and health information on the entire US Medicare enrollment population.  Accountability assessment of large-scale regulatory interventions relies on the large-scale availability of health data such as administrative hospital claims available on millions of individuals.  Our analysis focused on the susceptible population of elderly in the US that lived near a \PMTen monitor in the EPA monitoring network.  Alternative pollution measurement techniques (e.g., satellite measurements or spatial extrapolation) could expand such an analysis to consider individuals not residing close to a monitor location, and other administrative data sources (e.g., emergency department or electronic health records) could be used to focus on other populations.

Nonattainment designations are one key mechanism for air quality control in the US, and represent a key step to the enforcement of the NAAQS.  While our analysis entails important limitations, it provides important evidence of the effectiveness of this integral feature of air quality management in the US, and represents a distinct perspective that should be interpreted in conjuncture with - not instead of - the large body of epidemiological research motivating the setting and enforcement of NAAQS.  

\subsubsection{Additional work in Progress}
In addition to the case study presented above, we have engaged in a variety of related research endeavors designed to improve direct accountability assessment of the \PMTen nonattainment designations.  In \cite{cefalu_posterior-predictive_2015}, we propose new methodology that generalizes the approach of omitting observations that lack propensity score ``overlap'' to allow for a stochastic filtering that ultimately weights causal estimates according to posterior evidence that each observation has a comparable observation in the opposite treatment group.  We are working to deploy this methodology to analyze the causal effects of nonattainment designations over time.  We are also working to corroborate the results of the analysis presented here with newly developed methods for Bayesian nonparametric principal stratification and causal mediation analysis that will be presented in the multipollutant context in the \nameref{arpcasestudy} Section.



\subsection{Case Study 2: Accountability Assessment of Power Plant Emissions Controls}\label{arpcasestudy}
Here we present an analysis of the second case study, which investigates the causal impacts (on multiple emissions and on ambient \PMTwo) of \SOTwo scrubbers on coal-fired power plants.  The primary focus of this case study is to illustrate the ideas outlined in the \nameref{causalpathways} Section and our newly-developed methods for principal stratification and causal mediation analysis in the multipollutant setting.  Accordingly, the discussion in this section contains significantly more technical methodlogical detail than the previous sections.

%%%%%%%%%%%%% Lifted from Barret's Paper %%%%%%%%%%%%%%%%%
Title IV of the Clean Air Act established the Acid Rain Program (ARP), which required major emissions reductions of \SOTwo (and other emissions) from American power plants. The goal of this program was to reduce total \SOTwo emissions by ten million tons relative to 1980 levels (29.5 million tons per year). This drop was to be achieved mostly through cutting emissions from electricity-generating units (EGUs), a process enacted in two stages. Phase I (1995-1999) required 263 extremely polluting units to significantly reduce their emissions. Phase II, which began in 2000, placed a target \SOTwo emissions cap at 8.95 billion tons per year on about 3,200 EGUs, which cut power-sector emissions nearly in half from 1980 levels.

Impacts of the ARP have been evaluated extensively, and the program is generally lauded as a success story due to marked national decreases in \SOTwo and \NOx coming at relatively low cost.  Despite a 25\% increase in electricity production over the first 14 years of the program, \SOTwo emissions fell by 36\% (U.S. Environmental Protection Agency 2011b), and the program met its long-term goal of reducing EGU annual \SOTwo emissions to 8.95 tons by 2007, with emissions declining further through at least 2010 \citep{schmalensee_so2_2012}.  Estimates of the annualized human health benefits of the entire ARP range from \$50 billion to \$100 billion \citep{burtraw_costs_1998, burtraw_cost_1999, chestnut_fresh_2005, banzhaf_valuation_2006, schmalensee_so2_2012}. Recent analyses have used air quality model simulations to provide more targeted estimates of the health benefits attributable to emissions reductions from EGUs \citep{levy_quantifying_2007, levy_uncertainty_2009, buonocore_using_2014}.  Whether attempting to quantify the health impacts of the ARP as a whole or provide analyses of emissions reductions from specific EGUs, the existing evidence of the health benefits relies heavily on presumed relationships between power plant emissions, ambient \PMTwo, and human health.  

While power plants under the ARP had latitude to elect a variety of strategies to reduce emissions such as changes in combustion technology or shifts in fuel composition, one key strategy is the installation of a flue-gas desulfurization technology (henceforth, a ``scrubber'') to reduce $\text{SO}_2$ emissions.   The precise extent to which installation of a scrubber reduces ambient $\text{PM}_{2.5}$ through reducing $\text{SO}_2$ emissions remains unknown, and has never been estimated empirically amid the realities of actual regulatory implementation where pollution controls may impact a variety of factors that are also related to the formation of $\text{PM}_{2.5}$.

The goal of this case study is to examine the causal effects of installing a scrubber on a coal-fired power plant on the ambient concentration of ambient $\text{PM}_{2.5}$.  In particular, we aim to quantify the contribution of the presumptive causal pathway that a scrubber reduces $\text{SO}_2$ emissions, which reduces ambient $\text{PM}_{2.5}$, relative to the importance of other causal pathways due to concurrent reductions (or co-benefits) in other emissions or other factors.  Thus, the question will be formally framed as one of mediation analysis: To what extent is the causal effect of a scrubber (the ``treatment'') on ambient $\text{PM}_{2.5}$ (the ``outcome'') mediated through reduced emissions of $\text{SO}_2$, $\text{NO}_x$ and $\text{CO}_2$ (the ``mediators'')? 

To answer these questions, we use the data sources described in the \nameref{datasection} Section to provide a more refined direct accountability assessment of the extent to which a particular emissions-control action reduces emissions and causes improvements to ambient air quality.  Specifically, we evaluate the extent to which an \SOTwo scrubber on a coal-fired power plant 1) causally affects emissions of \SOTwo, \NOx, and \COTwo, 2) causally affects ambient \PMTwo, and 3) causes effects on ambient \PMTwo in a manner that is mediated through reducing \SOTwo, \NOx, and/or \COTwo.  We focus in particular on the question of mediated effects to provide rigorous evidence of the presumed relationships between actions, emissions, and ambient pollution that found a great deal of existing health benefits analyses.  Towards this end, we deploy our newly-developed Bayesian nonparametric statistical methods that draw on two frameworks for estimating causal effects in the presence of intermediate mediating variables: (1) principal stratification \citep{frangakis_principal_2002} and (2) direct and indirect effects, or so-called ``causal mediation analysis'' \citep{robins_identifiability_1992, pearl_direct_2001, vanderweele_mediation_2014}.  Both frameworks require development of new statistical methods to accommodate the multipollutant nature of the problem.



\subsubsection{Linked Data Sources}
Using the tools described in the \nameref{datasection} Section, we assembled a national database to conduct the investigation.  We obtain annual emissions data on 269 coal-fired power plants during the year 2005.  We also obtain various characteristics of the plants including \NOx emissions controls, annual average heat input in 2004, participating phase of the ARP (I or II), operating time in 2004, and coal sulfur content in 2004 (see Table \ref{arpdatatab}).  

We link each power plant to all ambient \PMTwo monitors located within a 150km radius of the plant.  Monitors located within 150km of more than one plant are linked to the closest plant.  For each plant, we calculate the average annual ambient \PMTwo concentration in 2005 among all \PMTwo monitors linked to that plant, as well as average temperature and barometric pressure during 2004.  Constructing the data set in this way entails many important limitations, mostly due to the realities of regional pollution transport.  We clarify the causal quantities being estimated in the \nameref{definingscrubbers} Section, and revisit this limitation in the \nameref{arpdiscussion} Section.

\begin{figure}[h]
\caption{Locations of power plants and linked ambient \PMTwo monitors for the analysis of \SOTwo scrubber effects on emissions and ambient \PMTwo.}\label{arpmap}
\includegraphics{Figures/PP_PM.pdf}
\end{figure}


\begin{table}[h]
\caption{Summary statistics for covariates and outcomes available for the analysis of \SOTwo scrubbers.}\label{arpdatatab}
\begin{tabular}{lcccc} \hline \hline
 & \multicolumn{2}{c}{\underline{Have scrubbers (n=70)}} & \multicolumn{2}{c}{\underline{Have no scubber (n=199)}} \\
  & Mean & SD & Mean & SD \\ \hline

   \underline{Monitor Data} \\
 Average Ambient \PMTwo 2005 & 11.59 &3.88&13.14&2.68\\
 Average Temperature 2004 &13.05&4.73&13.74&3.57\\
 Average Barometric Pressure 2004 &718.45&52.71&741.12&24.20\\

 \\ \underline{Power Plant Level Data} \\
Total \SOTwo Emission 2005 &1320.03& 1823.4&2222.26&2540.64\\
Total \NOx Emission 2005 &917.75&774.03&605.38&557.11\\
Total \COTwo Emission 2005 &565349&475743&391091.9&375841.8\\

 \\ \underline{Unit Level Data} \\
\NOx Scrubber Installation, Jan. 2005  &0.89&0.3&0.79&0.36\\
Average Heat Input 2004 &5501996&4623661& 3845140&3683800\\
Phase II Indicator 2004 &0.83&0.36&0.76&0.41\\
Total Operating Time 2004 &7697.35&650.73&7371.97&1007.99\\
Sulfur Content in Coal 2004 &1.13&0.89&0.65&0.43\\


\end{tabular}
\end{table}


\subsubsection{Defining the Intervention: \SOTwo Scrubber Installation}\label{definingscrubbers}
A power plant (or power generating facility) can consist of multiple EGUs, each of which may or many not be equipped with a scrubber.   Thus, to define the intervention at the level of the facility, we regard any facility to be ``treated'' with a scrubber if at least 10\% of the total heat input for that facility can be attributed to EGUs within that facility that have scrubbers installed as of January 2005.  Appendix \ref{app:pp:scrubbers} shows the distribution of \% heat input from an EGU with a scrubber across all 269 power plants, which illustrates that the vast majority of facilities had nearly all or nearly none of their heat input attributed to EGUs with scrubbers.  This suggests the relative unimportance of exactly how to define facilities as having been ``treated'' with a scrubber. 

The ``control'' condition used for comparison is the setting where no scrubbers were installed.  Thus, causal effects in this case study relate to comparisons between emissions and ambient \PMTwo that would be potentially observed if a particular facility did or did not adopt scrubbers to control \SOTwo emissions.   The ``intervention group'' of the study consists of the 70 power plants that had scrubbers in January 2005, and the ``control group'' consists of the 199 plants that did not.  Note that three power plants were excluded from this analysis because they installed scrubbers during 2005.


\subsubsection{Defining Potential Outcomes for Principal Stratification and Causal Mediation Analysis}\label{mediationpotentialoutcomes}
The primary objective of this case study is to characterize the extent to which installing a scrubber impacts ambient \PMTwo, ``through'' reducing emissions of \SOTwo, \NOx, and \COTwo.  The \nameref{causalpathways} Section outlines the rationale behind two analytic perspectives on such questions.   The \nameref{pm10nonattainment} Section illustrates one application of principal stratification for accountability assessment in the context of a single pollutant.  For the current case study, we refine the previous descriptions of potential outcomes, principal stratification, direct, and indirect effects to illustrate our statistical methods development to accommodate multipollutant accountability settings.

We formulate the approach with explicit potential-outcomes notation \citep{rubin_bayesian_1978}.  Consider a single power plant and let $Z \in \{0,1\}$ denote whether the power plant has scrubber(s) installed in January 2005, with $Z=1$ denoting the presence of a scrubber. Let $\{ M_k(z); k=1,\ldots,K \}$ denote the potential emissions of $K$ pollutants that would occur if the power plant were to have scrubber status $Z=z$, for $z=0,1$.  Henceforth, we fix $K=3$ so that $M_k(z), k=1,2,3$ denotes the potential emissions of \SOTwo, \NOx, and \COTwo, respectively.  The causal effect of the scrubber on emission $k$ can then be defined as a comparison between $M_k(1)$ and $M_k(0)$ comparing emissions that would be observed under the ``treatment'' and ``control'' condition.  Let $\bM(z_1, z_2,z_3) \equiv \{M_1(z_1), M_2(z_2), M_3(z_3)\}$ denote potential emissions under a set of three scrubber statuses $\{z_1, z_2,z_3\}$.  For example, $\bM(1,0,0)$ would represent the potential \SOTwo concentration under installation of a scrubber and the potential \NOx and \COTwo  concentrations that would be observed if the scrubber had not been installed.

We similarly define potential \PMTwo outcomes, but extend the notation to define potential concentrations under different potential values of scrubber status, $Z$, and different possible values of emissions, $\bM(z_1, z_2,z_3)$.  Thus, in full generality, each power plant has a set of
$2^{4}=16$ potential outcomes for $\text{PM}_{2.5}$, $Y(z;\bM(z_1,z_2,z_3))$, which denote potential values of $\text{PM}_{2.5}$ that would be observed
under intervention $Z=z$ with pollutant emissions set at values under
interventions $z_1, z_2, z_3$.  Definition of all $16$ potential \PMTwo concentrations implies that each emission could, at least in theory, be intervened upon independently \PMTwo and the other emissions.  Thus, it is worth noting that values of $Y(z;\bM(z_1, z_2,z_3))$, can be categorized into two groups. For  $z=z_1=z_2=z_3$, potential outcomes are observable from the data, that is, any power plant with a scrubber will have $Y(1;\bM(1,1,1))$ observed, and any power plant without a scrubber will have $Y(0;\bM(0,0,0))$ observed.  Were refer to these as {\it observable} potential outcomes.  In contrast, potential outcomes defined under any other values of the vector $(z, z_1, z_2, z_3)$ represent potential outcomes where a power plant is simultaneously subject to different interventions, and can never be observed in practice.  For example, $Y(1;\bM(0,0,1))$ would represent the potential ambient \PMTwo concentration near a plant under the hypothetical scenario where that plant installs a scrubber ($z=1$), but where emissions of \SOTwo and \NOx are set to what they would be without the scrubber ($z_1=z_2=0$) and emissions of \COTwo are set to what they would be with the scrubber ($z_3=1$).  We refer to these potential outcomes as {\it unobservable} (or {\it a priori} counterfactual), as they are never observed for any power plant.  Estimating causal effects relying on such unobservable potential outcomes will rely on unverifiable assumptions relating each of these unobservable quantities to observed relationships in the data.

Note that the total effect (TE) of scrubber installation on ambient \PMTwo can be defined as the comparison between the observable potential outcomes $Y(1;\bM(1,1,1))$ and $Y(0;\bM(0,0,0))$.  Various other causal effects related to ``causal pathways'' will be defined based on comparisons between different combinations of the above potential outcomes.  

{\bf Principal stratification} defines causal effects based only on the observable potential outcomes $Y(1;\bM(1,1,1))$ and $Y(0;\bM(0,0,0))$.  Associative effects represent causal effects of a scrubber on ambient \PMTwo among power plants where emissions are causally affected by the scrubber.  Dissociative effects represent causal effects of a scrubber on ambient \PMTwo among power plants where emissions are not meaningfully affected by the scrubber.  In the presence of multiple pollutants, associative and dissociative effects can be defined as functions of changes in each of the $K=3$ emissions.  Following development in \cite{zigler_estimating_2012}, we focus discussion on average (or expected) associative and dissociative effects defined as:
\begin{eqnarray*}
\text{EDE}_\mathcal{K} &=& E[Y(1;\bM(1,1,1)) - Y(0;\bM(0,0,0)) \,\big\vert\,
|(\bM(0,0,0) - \bM(1,1,1) )|_{\mathcal{K}} < C _{\mathcal{K}}^D],\\
\text{EAE}_\mathcal{K} &=& E[Y(1;\bM(1,1,1)) - Y(0;\bM(0,0,0)) \,\big\vert\,
|(\bM(0,0,0) - \bM(1,1,1) )|_{\mathcal{K}} > C _{\mathcal{K}}^A],
\end{eqnarray*}
where $|(\bM(1,1,1) - \bM(0,0,0) )|_{\mathcal{K}}$ denotes a vector of
absolute differences between potential emissions of the subset of pollutants in the set 
$\mathcal{K}$, with $>$ and $<$ representing element-wise comparisons
between vectors of mediators.  For example, $\mathcal{K} = \{1,2\}$ would be used to define associative and dissociative effects based only on causal effects on emissions of \SOTwo and \NOx, without regard to the effect on \COTwo. Here, $C_\mathcal{K}^A$ denotes a vector of thresholds beyond which a change in each pollutant emission in $\mathcal{K}$ is considered meaningful, while $C_\mathcal{K}^D$ is a vector of
thresholds below which changes in these pollutant emissions are considered not
meaningful. For example, with $\mathcal{K}=\{1,2,3\}$ and
$C^A_{\mathcal{K}=\{1,2,3\}} \equiv \{C^A_1, C^A_2, C^A_3\}$,
$\text{EAE}$ could estimate the average causal effect on $\text{PM}_{2.5}$ among power plants for which scrubber installation causally affected emissions of \SOTwo, \NOx, and \COTwo in excess of $C^A_1, C^A_2, C^A_3$, respectively.  Estimates of EAE and EDE are useful summary measures of causal effects on \PMTwo, on average, when emissions change or do not change, but the relationship between causal effects on \PMTwo and causal effects on emissions can vary as an entire surface describing effects on \PMTwo for any particular values of $\bM(0)$ and $\bM(1)$.  In addition to estimating EDE and EAE for different $\mathcal{K}$ as defined above, we also estimate entire surfaces of, for example, how the causal effect on \PMTwo varies as a function of the causal effect on each emission.  

{\bf Causal mediation analysis} relies on the definition of {\it natural direct} and {\it natural indirect effects} \citep{robins_identifiability_1992, pearl_direct_2001}, which are defined based on potential outcomes that are described above as unobservable.  Natural direct effects in this context are defined as causal effects of scrubber installation on \PMTwo when emissions are set to their ``natural'' value that would be observed without a scrubber, thus representing the causal effect of scrubber installation on \PMTwo that is ``direct'' in the sense that it is not attributable to changes in emissions.  Formally, we define the natural direct effect (NDE) in the multipollutant setting as $\text{NDE}$ = $E[Y(1;\bM(0,0,0))-Y(0; \bM(0, 0, 0))]$. 

Natural indirect effects in this context are defined to represent causal effects of scrubber installation on \PMTwo that are attributable only to emissions changes.  In the multipollutant setting, different natural indirect effects are defined based on different multipollutant emissions.  The joint natural indirect effect of all $3$ mediators, $\text{JNIE}_{123}$ (i.e., the indirect effect attributable to changes in all three emissions), is derived by subtracting the natural direct effect from the total effect, $\text{JNIE}_{123}=\text{TE}-\text{NDE} = E[Y(1;\bM(1,1,1))-Y(1; \bM(0, 0,0))]$, where TE is as defined in the \nameref{mediationpotentialoutcomes} Section.  

In addition to $\text{JNIE}_{123}$, which is of interest, we introduce a decomposition of this joint effect into the natural indirect effect attributable to changes in different combinations of the $K=3$ emissions.  The $\text{JNIE}_{123}$ can be decomposed into emission-specific indirect effects and the joint indirect effects of all possible pairs of emissions.  See Figure \ref{decomp} for a graphical representation of the different natural indirect effects.

\begin{figure}[h]
\centering
\scalebox{0.09}
{\includegraphics{Figures/decomp1.png}} 
\caption{Graphical representation of partitioning the $\text{JNIE}_{123}$ for 3 mediators}\label{decomp}
\label{fig1}
\end{figure}

The {\em mediator-specific} NIE for the $k^{th}$ emission is defined as a comparison between potential \PMTwo outcomes where the $k^{th}$ emission varies between what it would be with and without a scrubber, but all other emissions are fixed to the potential value that would be observed with the scrubber.  The mediator-specific NIEs for emissions of \SOTwo, \NOx, and \COTwo are define as:
$$\text{NIE}_1 = E[Y(1; M_1(1), M_2(1), M_3(1)) - Y(1; M_1(0), M_2(1), M_3(1))],$$ 
$$\text{NIE}_2 = E[Y(1; M_1(1), M_2(1), M_3(1)) - Y(1; M_1(1), M_2(0), M_3(1))], $$ and
$$\text{NIE}_3 = E[Y(1; M_1(1), M_2(1), M_3(1)) - Y(1; M_1(1), M_2(1), M_3(0))],$$
respectively.

In a similar fashion we can define the joint natural indirect effect
attributable to changes in pairs of mediators $j$ and $k$,$\text{JNIE}_{jk}$ (the second row in Figure \ref{decomp}).  The joint natural indirect effect of mediators $j$ and $k$ are defined as differences
between the potential \PMTwo that would be observed with a scrubber and the
analogous potential outcome but with pollutants $j$ and $k$ set to their values that would have been emitted without the scrubber. Specifically, the joint natural indirect
effects for changes in \SOTwo and \NOx is defined as:
$$\text{JNIE}_{12} = E[Y(1; M_1(1), M_2(1), M_3(1)) - Y(1; M_1(0), M_2(0), M_3(1))],$$ 
with the analogously-defined effects for (\NOx and \COTwo) and (\SOTwo and \COTwo) defined as
$$\text{JNIE}_{23} = E[Y(1; M_1(1), M_2(1), M_3(1)) - Y(1; M_1(1), M_2(0), M_3(0))], $$ and
$$\text{JNIE}_{13} = E[Y(1; M_1(1), M_2(1), M_3(1)) - Y(1; M_1(0), M_2(1), M_3(0))],$$
respectively.  

As can be seen in Figure \ref{fig1}, the joint natural indirect effect of each
pair of mediators is not equal to the sum of corresponding mediator-specific NIEs.  That is, our definitions of indirect effects do not assume additivity of effects, nor do they assume that the indirect effects are non-overlapping.  This is essential in the multipollutant setting where multiple pollution emissions are measured contemporaneously and are not generally independent of each other.  Thus, our methods development is in important contrast to the budding literature on mediation analysis with multiple mediators that tends to assume non-overlapping, independent, sequential, or additive effects \citep{mackinnon_introduction_2008, imai_identification_2013,  daniel_causal_2014, vanderweele_mediation_2014}.


\subsubsection{Estimation: New Methods for Bayesian Nonparametric Mediation Analysis}
Estimation and inference for the causal effects defined above are based on models for the joint distribution of outcomes and mediators,
$$[Y(0;\bM(0,0,0)),Y(1;\bM(1,1,1)), \bM(0,0,0), \bM(1,1,1)|\boldsymbol{X}],$$ 
where $\boldsymbol{X}$ denotes a vector of baseline covariates used to adjust for confounding (see Table \ref{arpdatatab}). This joint distribution is not identified based on the observed data, as potential outcomes are never jointly observed in both the presence and absence of a scrubber.  Thus, our estimation strategy can be characterized as consisting of three steps.  First, we specify flexible Bayesian nonparametric models for the marginal distributions of observed data , which consist of values of $(Y(0;\bM(0,0,0)), \bM(0,0,0))$ observed for power plants that did not install scrubbers, and $(Y(1;\bM(1,1,1)), \bM(1,1,1))$ observed for power plants that did install scrubbers \citep{escobar_bayesian_1995, muller_bayesian_1996, jara_dppackage:_2011}.  Second, we link all of these flexibly-modeled marginal distributions into a coherent joint distribution through the use of a Gaussian copula model \citep{nelsen_introduction_1999}.  Unobserved (but observable) potential outcomes are then simulated from their posterior-predictive distributions to estimate the TE and the associative and dissociative effects.  Finally, a series of assumptions are used to relate unobservable potential outcomes to observed relationships in the data to provide estimates of the natural direct and indirect effects.  Details of the statistical models, the assumptions for identification, and the MCMC computational algorithm appear in Appendix \ref{app:pp}.



\subsection{Case Study 2 Results}
Table \ref{arpdatatab} indicates that the 150km area around power plants with \SOTwo scrubbers installed in January 2005 had average ambient \PMTwo that was lower, on average, than the areas surrounding power plants without scrubbers (11.6 vs. 13.1 $\mu g/m^3$).  The power plants with \SOTwo scrubbers also emitted less \SOTwo, more \NOx, and more \COTwo than the plants without scrubbers, and they also tended to burn coal with a higher sulfur content.  

Using the approach outlined above, having an \SOTwo scrubber installed is estimated to cause \SOTwo emissions to be -1.00 log(tons) lower, on average, than they would be without the scrubber (95 \% posterior interval (-1.41, -0.64) ).  The analogous causal effects for \NOx and \COTwo emissions were 0.15 (-0.12, 0.41) and 0.06 (-0.11, 0.24), respectively, indicating that \SOTwo scrubbers did not causally affect these emissions, on average.  The total effect (TE) of having an \SOTwo scrubber installed on ambient \PMTwo within a 150km radius of a power plant is estimated to be -1.18 $\mu g/m^3$ (95\% posterior interval -2.09, -0.29), indicating that having a scrubber installed causally reduces ambient \PMTwo relative to what would occur in the absence of a scrubber.  


%\begin{table}[h]
%\centering \caption{Posterior means (95\% C.I.s) of the caual effects on emissions.}\label{tab:emissions}
%\begin{tabular}{c|ccc}
% & \SOTwo & \NOx & \COTwo \\\hline
% Mean & -1.000 & 0.149 & 0.060\\ 
% 95\% C.I. & (-1.409, -0.635) & (-0.115, 0.413) & (-0.114, 0.236) \\
%\end{tabular}
%\end{table}

The principal stratification analysis can provide important evidence of the extent to which causal reductions in \PMTwo are associative or dissociative with changes in emissions of \SOTwo, \NOx, and \COTwo. We define average associative and dissociative effects for 1) the triplet of \SOTwo, \NOx, and \COTwo emissions; 2) each pair of emissions, and 3) each emission individually.  For the $k^{th}$ emission, let $\sigma_k$ denote the standard deviation of the estimated individual-level causal effect of a scrubber on the $k^{th}$ emission, with $\sigma_1 = 1.526$, $\sigma_2 = 1.084$, $\sigma_1 = 0.945$.  We consider changes within $0.5\sigma_k$ of 0 to represent little or no causal effect on emissions, and changes in excess of 0.5$\sigma_k$ to represent meaningful changes in emissions.  Thus, for the $EDE_\mathcal{K}$, we set $C_{\mathcal{K}}^D = 0.5 \sigma_\mathcal{K}$, to define the average effect of a scrubber on ambient \PMTwo among power plants where there is little or no estimated causal effect on the emissions in $\mathcal{K}$.  For each $\mathcal{K}$, we define two types of EAE: $\text{EAE}_1$ representing the causal effect of a scrubber on \PMTwo among power plants where emissions were {\it reduced} by at least $0.5\sigma_k$ relative to what they would have been without the scrubber, and $\text{EAE}_2$ representing the analogous effect among power plants where emissions were {\it increased} by at least $0.5\sigma_k$.  These correspond to $C_{\mathcal{K}}^A = 0.5\sigma_\mathcal{K}$

%%%%  ($\sigma_1 = 1.526$, $\sigma_2 = 1.084$, $\sigma_1 = 0.945$)

\begin{table}[h]
\centering \caption{Posterior mean and 95\% probability intervals for expected associative and dissociative effects of \SOTwo scrubbers.}\label{tab:ps}
\resizebox{\textwidth}{!}{  
\begin{tabular}{c|c|ccccccc}
& & \SOTwo & \NOx & \COTwo & \SOTwo \& \NOx & \SOTwo \& \COTwo & \NOx \& \COTwo & \SOTwo \& \NOx \& \COTwo \\
\hline\hline
\multirow{2}{*}{EAE$_1$}& Mean &  -1.402 & -1.251 & -1.469 & -1.385 & -1.605 & -1.452 & -1.549 \\
& 95\% P.I.& (-2.323, -0.565) & (-2.197, -0.375) & (-2.458, -0.554) & (-2.348, -0.522) & (-2.606, -0.681) & (-2.418, -0.614) & (-2.531,-0.692)\\
\hline
\multirow{2}{*}{EDE}& Mean &  -1.063 & -1.253 & -1.221 & -1.047 & -1.082 & -1.245 & -1.062 \\
& 95\% P.I. & (-2.096, -0.062) & (-2.114, -0.358) & (-2.142, -0.349) & (-2.119, -0.030) & (-2.164, -0.065) & (-2.124, -0.360) & (-2.115,-0.067)\\
\hline
\multirow{2}{*}{EAE$_2$}& Mean &-0.669 & -1.143 & -0.985 & -0.696 & -0.643 & -0.987 & -0.675 \\
& 95\% P.I. & (-2.066, 0.963) & (-2.210, -0.069) & (-1.988, -0.026) & (-2.192, 1.079) & (-2.197, 1.004) & (-2.057, 0.081) & (-2.250,1.012)\\
\end{tabular}}
\end{table}


For each set of emissions, Figure \ref{fig4} depicts the posterior mean estimate of $EDE$ (blue circle), $EAE_1$ (red circle), and $EAE_2$ (grey circle).  The size of the circle is proportional to the estimated proportion of power plants that contribute to the estimate, that is, the estimated proportion  of plants that have changes in emissions in accordance with the values of $C_{\mathcal{K}}^D$ and $C_{\mathcal{K}}^A$.  For example, the red circle in the first column of Figure \ref{fig4} indicates that a scrubber is estimated to reduce \SOTwo emissions by at least $C_{\mathcal{K}}^A$ for 58\% of power plants, and the average causal effect on \PMTwo among these power plants is a reduction of 1.40 $\mu g/m^3$ relative to what would have occurred if a scrubber had not been installed.   The blue circle indicates that a scrubber is estimated to have little or no causal effect on \SOTwo emissions among 31\% of power plants, and the average causal effect of the scrubber on ambient \PMTwo in these plants is a reduction of 1.06 $\mu g/m^3$.  The grey circle indicates that 11\% of plants are estimated to cause increased emissions of \SOTwo of at least $C_{\mathcal{K}}^A$, and the average causal effect of the scrubber on ambient \PMTwo around these plants is a reduction of 0.67 $\mu g/m^3$.  The second two columns of Figure \ref{fig4} can be interpreted analogously, but for \NOx and \COTwo, respectively.  Columns 4-6 of Figure \ref{fig4} depict EDE, EAE$_1$, and EAE$_2$ defined based on changes in the corresponding pairs of emissions.  For example, the red dot in column 5 of Figure \ref{fig4} indicates that the average causal effect on \PMTwo among power plants where both \SOTwo and \COTwo were reduced was a reduction of 1.60 $\mu g/m^3$, and 20\% of plants were estimated to experience such emissions reductions.  Table \ref{tab:ps} lists posterior mean and 95\% posterior intervals of EDE, EAE$_1$, and EAE$_2$ for all possible $\mathcal{K}$.  Overall, we see that all estimates of EDE, EAE$_1$ and EAE$_2$ are negative, implying that scrubbers causally reduce \PMTwo regardless of the precise manner of emissions reductions of \SOTwo, \NOx, and \COTwo.  This indicates the existence of some other causal pathway whereby scrubbers reduce \PMTwo in a manner that is not captured solely by the emissions reductions of these pollutants.  Note that EAE$_1$ tends to be larger in magnitude than the EDE whenever $\mathcal{K}$ includes \SOTwo emissions, implying that the scrubbers reduce \PMTwo more when there is a larger reduction in \SOTwo emissions.  This is suggestive of the anticipated causal pathway whereby the scrubber reduces \SOTwo which, in turn, reduces \PMTwo. A similar pattern is also slightly evident for reductions in \COTwo. 

\begin{figure}
\centering
%\scalebox{0.51}
\includegraphics[width = \textwidth]{Figures/ps_plot_05.pdf}
\caption{Posterior mean estimates of average associative and dissociative effects of \SOTwo scrubbers.  Size of circle is proportional to the percent of observations estimated to belong in the corresponding strata, and number listed is posterior mean proportion.}
\label{fig4}
\end{figure}


While estimates of EDE, EAE$_1$, and EAE$_2$ provide useful summary quantities of how the causal effect of a scrubber on \PMTwo varies with the causal effect of the scrubber on emissions, we can also examine entire surfaces of how the scrubber effect on \PMTwo varies with the causal effect on emissions.  Figure \ref{fig6} depicts 3-D surface plots for each emission, where the surface summarizes the estimated causal effect on \PMTwo for any estimated causal effect of the scrubber on emissions.  Figure \ref{fig6}\subref{so2} depicts the estimated effect on \PMTwo, $Y(1, \bM(1,1,1)) - Y(0, \bM(0,0,0))$, for any combination of $(M(0), M(1))$ across the range observed in the data.  Note that nearly the entire surface lies below 0, indicating that the scrubber is estimated to reduce \PMTwo regardless of the effect on \SOTwo emissions.  The surface is lowest in the region where $M(1) < M(0)$, and the steepest portion of the surface is in regions where $M(0)$ is high, that is, the causal effect on \PMTwo is estimated to be most pronounced in power plants that would have the highest emissions absent a scrubber and exhibit a causal reduction in emissions due to the scrubber.  The analogous surfaces for \NOx and \COTwo in Figures \ref{fig6}\subref{nox} and \ref{fig6}\subref{co2} are much flatter, which suggests that any causal effect of a scrubber on \PMTwo varies only slightly with varying causal effects on these emissions.  Note that In Figure \ref{fig6}\subref{so2}, the blue dots representing the estimated causal effect of the scrubber on \SOTwo emissions for 1 MCMC iteration lie almost entirely in the region where $M(1) < M(0)$ reflecting that the scrubber is estimated to reduce \SOTwo emissions for nearly all power plants.  In Figures \ref{fig6}\subref{nox} and \ref{fig6}\subref{co2}, the analogous blue dots follow more closely and symmetrically around the line $M(1) = M(0)$, reflecting that \SOTwo scrubbers do not have strong effects on these emissions on average.

The overall conclusion of the principal stratification analysis is that 1) causal effects of scrubber on ambient \PMTwo are evident regardless of the causal impacts on emissions of \SOTwo, \NOx, and \COTwo; 2) larger causal reductions in \SOTwo are associated with larger causal reductions in \PMTwo, regardless of changes in other emissions, and 2) there is some evidence that scrubber effects on ambient \PMTwo are also associated with reductions in \COTwo emissions relative to what they would have been absent the scrubber.  While not conclusive about the mediated indirect effects whereby the scrubber reduces \PMTwo through reducing \SOTwo emissions, this analysis is highly consistent with the presence of such a causal pathway.  

\begin{figure}[h]
\vskip -13ex
\subfloat[$k=1$ (\SOTwo)]{
  \includegraphics[width=.5\textwidth]{Figures/3D_plot1_true.png}\label{so2}}
\subfloat[$k=2$ (\NOx)]{
  \includegraphics[width=.5\textwidth]{Figures/3D_plot2_true.png}\label{nox}}\\
\subfloat[$k=3$ (\COTwo)]{
  \includegraphics[width=.5\textwidth]{Figures/3D_plot3_true.png}\label{co2}}
\caption{Average surface plots of the causal effect on \PMTwo for different values of  $(M_k(0), M_k(1))$.  Values of $(M_k(0), M_k(1))$ are plotted on the $x$- and $y$- axes, and determine the causal effect of a scrubber on emission $k$.  The corresponding value of the causal effect of a scrubber on \PMTwo ($Y(1)-Y(0)$) is plotted on the $z$-axis.  The blue cloud of points are simulations of $(M_k(0), M_k(1))$ for one MCMC iteration to represent the range of values of $(M_k(0), M_k(1))$ consistent with the observed data.  Red lines are at $M_k(0) =  M_k(1)$ (solid line) and $+/- \sigma_k$ (dashed lines). }
\label{fig6}
\end{figure}

Augmenting the principal stratification analysis with assumptions about unobservable potential outcomes that conceive of independent manipulations of scrubbers and each emission individually permits estimation of natural direct and indirect effects.  These effects speak more directly to the extent to which the effect of an \SOTwo scrubber on ambient \PMTwo within 150km is mediated through various emissions. Recall that the mediator-specific natural indirect effect of the $k^{th}$ emission and the joint natural indirect effect of the $j^{th}$ and $k^{th}$ emissions are denoted as
$\text{NIE}_k$ and $\text{JNIE}_{jk}$, respectively, where $k=1$
indicates $\text{SO}_2$, $k=2$ indicates $\text{NO}_x$ and $k=3$
indicates $\text{CO}_2$.

Table \ref{tab:mediation} summarizes point estimates and 95\% posterior intervals of the TE, NDE, JNE$_{123}$, JNIE$_{12}$, JNIE$_{23}$,  JNIE$_{13}$, and the individual NIEs.  Figure \ref{fig7} depicts boxplots of the entire posterior distributions of these quantities.  The posterior mean (95\% posterior interval) NDE representing the direct effect of a scrubber on ambient \PMTwo that is not mediated through any emissions changes is a reduction in ambient \PMTwo of 0.74 (-1.67, 0.18) $\mu g /m^3$.  This represents the amount that \PMTwo would decrease if a scrubber were installed but emissions of \SOTwo, \NOx, and \COTwo were somehow fixed to remain constant at what the would have been without the scrubber.  The indirect effect via all three emissions (JNIE$_{123}$) is estimated with posterior mean (95\% posterior interval) -0.44 (-0.70, -0.23), which represents the reduction in \PMTwo that would occur around a plant with a scrubber relative to what would happen if emissions of \SOTwo, \NOx, and \COTwo were somehow changed to what they would have been absent the scrubber.  The relative magnitudes of the TE and JNIE$_{123}$ indicate that over one third of the total effect of a scrubber on ambient \PMTwo is jointly mediated through changes in \SOTwo, \NOx, and \COTwo.  The NIE for \SOTwo (NIE$_1$) is estimated with posterior mean (95\% posterior interval) -0.36 (-0.53, -0.22), indicating that most of the joint indirect effect is due to reductions in \SOTwo.  The NIE for \NOx (NIE$_2$) is also slightly negative (posterior mean (95\% posterior interval) -0.10 (-0.26, 0.06)), and the NIE for \COTwo (NIE$_3$) is estimated to be very close to 0 (posterior mean (95\% posterior interval) 0.02 (-0.04, 0.08)).  Estimates of the joint indirect effects that involve \SOTwo (JNIE$_{12}$ and JNIE$_{13}$) are close in magnitude to that of JNIE$_{123}$.  The estimated JNIE$_{12}$ is nearly identical to the JNIE$_{123}$, indicating that reductions in both \SOTwo and \NOx account for nearly all of the joint mediated effect of all emissions.  The estimated JNIE$_{13}$ is nearly identical to NIE$_1$, indicating that combining causal reductions of \SOTwo with causal reductions of \COTwo does not significantly change the mediated effect over what is due to reductions in \SOTwo alone.  

The overall conclusion of the causal mediation analysis is that 1) \SOTwo scrubbers have a strong direct effect on ambient \PMTwo that is not mediated through changes in emissions of \SOTwo, \NOx, or \COTwo; 2) slightly over one third of the total effect of scrubbers on \PMTwo is mediated through changes in these emissions, with a a strong indirect effect due to reductions in \SOTwo accounting for most of the indirect mediated effect; 3) there is some evidence of a mediated effect due to changes in \NOx emissions, and 4) there is no evidence of mediation due to reductions in \COTwo emissions.  Appendix \ref{app:pp:overlap} provides an examination of the extent to which the indirect effects of the three emissions overlap with one another.

\begin{table}[h]
\centering \caption{Posterior mean and 95\% probability intervals for total, direct, and indirect effects of \SOTwo scrubbers.}\label{tab:mediation}
\begin{tabular}{c|ccc}
 & TE & $\text{JNIE}_{123}$ & NDE \\
 Mean & -1.179 & -0.444 & -0.735\\ 
 95\% C.I. & (-2.091, -0.290) & (-0.697, -0.231) & (-1.666, 0.176) \\
\hline
& $\text{NIE}_1$ & $\text{NIE}_2$ & $\text{NIE}_3$ \\
 Mean& -0.364 & -0.097 & 0.019 \\
95\% C.I.& (-0.530, -0.220) & (-0.257, 0.063) & (-0.038, 0.080) \\\hline
&$\text{JNIE}_{12}$ & $\text{JNIE}_{23}$ &
$\text{JNIE}_{13}$\\
Mean& -0.462 & -0.079 & -0.345\\
95\% C.I.& (-0.706, -0.271) & (-0.242, 0.091) & (-0.509, -0.191)\\
\end{tabular}
\end{table}
\begin{figure}[h]
  \centering
  \includegraphics[scale=0.75]{Figures/dist.pdf}
\caption{Posterior distributions of direct and indirect effects in the analysis of \SOTwo scrubbers. }
\label{fig7}
\end{figure} 



\subsection{Conclusion and Discussion of Case Study 2}\label{arpdiscussion}
In this case study, we have evaluated the effectiveness of a specific regulatory intervention (presence of an \SOTwo scrubber on a coal-fired power plant in 2005) in terms of 1) the intervention's causal effect on annual emission of \SOTwo, \NOx, and \COTwo in 2005 2) the intervention's causal effect on average annual ambient \PMTwo in 2005 among monitors within 150km of a power plant, and 3) the extent to which the intervention's causal effect on ambient \PMTwo is mediated through causal reductions in multiple emissions.  We focus in particular on the question of mediation to provide the first empirical evidence of the presumed causal relationships that motivate emissions control interventions that were a key part of the ARP and continue as important strategies for improving ambient air quality and, ultimately, human health.  Given that our questions of interest - and indeed many accountability questions - pertain to mediated effects of multiple pollutants that are measured contemporaneously and possibly interact with one another, we developed new methods for principal stratification and causal mediation analysis for multiple contemporaneous and non-independent mediators.  We introduced Bayesian nonparametric modeling and estimation techniques to provide flexible models to the observed data, and linked observed data distributions into joint distributions of potential outcomes using explicit and transparent assumptions (mostly contained in Appendix \ref{app:pp:assumptions}).

A key feature of our case study is the integration of a principal stratification analysis and a causal mediation analysis that rely on the exact same modeling assumptions for the observed data.  The difference between these two analysis pertains to the presence or absence of assumptions about potential outcomes that are {\it unobservable} for every member of the study sample.  Thus, we begin with a principal stratification analysis relying on assumptions for the observable outcomes $Y(z, \bM(z,z,z)), \bM(z,z,z)$ to
identify the principal causal effects, and then augment these assumptions with assumptions about unobservable potential outcomes (e.g., $Y(1, \bM(0,0,0))$) to estimate mediation effects. The explicit connection between principal stratification and causal mediation analyses explored here represents, to our knowledge, the most comprehensive consideration of these two approaches and the implications of their results in the context of a single analysis.  

The results of the principal stratification and causal mediation analyses should be interpreted jointly, and are, in this case study, largely consistent with one another.  The estimated dissociative effects were significantly less than zero for all combinations of emissions, indicating that scrubbers reduced \PMTwo even without changing emissions significantly.  This is consistent with the result of the mediation analysis where the estimated NDE accounted for nearly two thirds of the total effect of a scrubber on ambient \PMTwo.  The difference between associative and dissociative effects was most pronounced when considering emissions of \SOTwo, either individually or in combination with other emissions.  This indicated that power plants that exhibited large causal effects on \SOTwo emissions also exhibited large effects on ambient \PMTwo.  This conclusion is consistent with the results of the mediation analysis where the NIE of \SOTwo and the the JNIEs involving \SOTwo (JNIE$_{12}$, JNIE$_{13}$, and JNIE$_{123}$) were all significantly less than zero and similar in magnitude to one another.  Furthermore, the magnitude of the associative effects relative to NDE and NIEs are consistent with the well-known result that, in general, associative effects are a mixture of direct and indirect effects.  For example, the EAE for plants that exhibited large reductions in \SOTwo was estimated to be -1.402, which is similar to the sum of the estimates of NDE$= -0.74$ and NIE$_1 = -0.36$.

Interpretation of the results of this case study should be viewed in light of several important limitations.  First is the relative simplicity with which we linked power plants to monitors.  Specifically, our strategy simply links power plants to all of the ambient monitors within 150km.  Thus, our analysis is of the causal effects of scrubbers on average \PMTwo measured within 150km, which likely does not reflect the full effect of emissions changes from power plants on ambient air quality, which are expected to have implications at distances much greater than 150km.  Furthermore, when a monitor is located within 150km of more than one power plant, it is assigned to its closest power plant, which results in many power plants that are assigned to no monitors being excluded from our analysis.  Thus, our analysis assumes that impacts on ambient \PMTwo due to emissions changes in these excluded power plants are distributed evenly across areas surround the scrubber and no scrubber power plants included in the analysis.  More sophisticated strategies to link ambient monitors to power plants based on features such as atmospheric conditions and weather patterns are warranted, but analysis of the data constructed here represents an important approximation that still yields valuable inferences, especially with respect to quantifying causal pathways.  

Another important limitation of this analysis is that it assumes that the factors listed in Table \ref{arpdatatab} are sufficient to control for confounding, which in this case would consist of differences between power plants or other features related to ambient \PMTwo that are also associated with whether a power plant had a scrubber installed in 2005.  

To develop our new methods we also considered analysis of a single year and regarded a power plant as ``treated'' if it had a scrubber installed in January 2005, without regard to how long the scrubber had been installed or changes in emissions and ambient \PMTwo over time.  Future work will develop a framework to accommodate longitudinal analysis by using Bayesian dynamic models, which could update information from the past and smooth the effects over the course of a several year time period \citep{kim_longitudinal_2015}.  

The results of our analysis are largely consistent with expectations: \SOTwo scrubbers appear to causally reduce ambient \PMTwo (within 150km) and this causal effect is mediated to a considerable extent by causal reductions in \SOTwo emissions and not to considerably mediated through reductions in \NOx or \COTwo emissions.  The finding that there appears to be a considerable direct effect of \SOTwo scrubbers on \PMTwo that is not mediated through emissions of \SOTwo, \NOx, or \COTwo is somewhat surprising.  More work is warranted to learn about possible explanations of this direct effect, which could relate to reduction of other emissions (e.g., primary particles), features of secondary formation of \PMTwo that are not directly captured by direct emissions, or simply the attribution of direct effects to phenomena that truly relate to unmeasured confounding.  

Despite the limitations of this case study, we have provided the first empirical investigation of the presumed causal pathways that motivate a variety of air quality control strategies that aim to reduce harmful emissions from power plants.  Using a principled causal inference framework and rigorous analysis to quantify causal pathways, we have evaluated the effectiveness of scrubber installation for reducing emissions and ambient \PMTwo, representing an analysis of two important links in the chain of accountability amid the realities of actual regulatory implementation.  The health implications of our analysis rely on the presumed link between ambient \PMTwo and health outcomes, but the methods presented here can be applied in other multipollutant accountability settings, including extensions of the current analysis to investigate, for example, the extent to reductions in multiplollutant emissions mediate causal health effects or the extent to which scrubber-induced changes in ambient \PMTwo (or other pollutants) mediate causal effects on health outcomes.  

\subsubsection{Additional Work in Progress}
In addition to the analysis of Case Study 2, we have a variety of ongoing analyses of causal impacts of power plant emissions controls.   We have rigorously evaluated the causal impact of \SOTwo and \NOx control strategies on \SOTwo and \NOx emissions among 995 coal-burning EGUs during the years 1997-2012.  We are also extending the analysis of Case Study 2 to investigate the extent to which the causal effect of \SOTwo scrubber installation on Medicare health outcomes is mediated through reductions in ambient \PMTwo, which requires more sophisticated procedures for linking data between power plants, ambient monitors, and residential zip codes of Medicare beneficiaries.  

\newpage
\section{Discussion and Conclusions}
Over the past ten years, important progress in accountability assessment has initiated a new dimension to the scientific evidence available for informing policy decisions.  While important challenges remain, the perspectives and methods in this report represent progress towards rigorous evaluation of large-scale regulatory policies.  Sharpening the distinction between analytic perspectives for exposure-response estimation and for estimating causal effects of well-defined actions is necessary in order to advance accountability assessment beyond evaluation of localized, abrupt actions and towards informing policy debates with evidence of the effects of broad and complex regulations.  We have outlined the particular relevance of potential outcomes methods for causal inference for advancing the goals of accountability assessment to focus on the direct evaluation of the effectiveness of specific policies or actions.  Our analysis in the \nameref{pm10nonattainment} Section illustrated how potential-outcomes reasoning can be deployed towards the goals of long-term direct accountability assessment.  The \nameref{arpcasestudy} Section outlined the development of new statistical methods for multipollutant accountability assessment, and illustrated how potential-outcomes perspectives can be useful for quantifying various causal pathways through which an air quality intervention impacts outcomes.  

The deployment of potential outcomes methods for direct accountability assessment represents an important new direction for accountability research and air pollution epidemiology more broadly.   Defining causal consequences of well-defined actions, what we refer to as direct accountability, stands in important contrast to the study of how air pollution exposure relates to the onset of clinical disease.  Thus, the analytic perspectives and associated statistical methods here are consistent with a recent emphasis on so-called ``consequentialist epidemiology'' that turns focus away from identifying underlying causes of disease and towards development of consequential interventions \citep{galea_argument_2013}.  This is not to diminish the importance of a vast array of air pollution epidemiological evidence that motivates the need to intervene to control population exposures, but rather to emphasize the need to provide equally strong evidence of the consequences of specific interventional strategies to protect public health and the environment.   While no single analytic strategy can overcome all the challenges inherent to accountability, assessment the best science should be generated from a variety of available approaches. We argue that rigorous efforts to directly evaluate causal effects of well-defined regulatory interventions constitute one such approach that, while distinct from traditional epidemiological tools, is essential to the current regulatory climate.


\newpage
\singlespacing
\bibliographystyle{biom}
\bibliography{HEIAccountabilityGrant}

\end{document}




%%%%%%%%%%%%%%%%   Summary of Barrett's Emissions Paper    %%%%%%%%%%%%%%
\subsubsection{Causal Effects on Emissions}
	The 1990 amendment to the Clean Air Act implemented a cap-and-trade system that required electricity-generating power plants to dramatically reduce Sulfur Dioxide (\SOTwo) and Nitrogen Oxide (\NOx) emissions. Plants impacted by this legislation had a variety of compliance options, including decreasing factory operation, purchasing carbon credits, installing scrubbers, and changing fuel inputs. Using data from 1997-2012 of 995 coal-burning power plants, we examine the effectiveness of scrubber installation in reducing \SOTwo and \NOx emissions. Specifically, we employ two methods: a propensity score algorithm and a matching algorithm to estimate: 1) the causal effect of scrubber installation prior 1997 on the emissions during 1997; and 2) the causal effect of scrubber installation at any time during the period 1997-2012 on emissions two months following scrubber installation. Using a propensity score method, we found that pre- 1997 \SOTwo scrubbers reduced 1997 \SOTwo emissions by 68\% (95\% CI 58\% to 76\%), and pre-1997 \NOx scrubbers reduced 1997 \NOx emissions by 28\% (16\%, 38\%). Additionally, installing \SOTwo and \NOx scrubbers at any time during the period 1997-2012 reduces \SOTwo and \NOx emissions by 89\% (88\%, 90\%) and 21\% (19\%, 24\%) two months following installation, respectively. These final two results are corroborated by a matching algorithm, which finds scrubbers cause \SOTwo and \NOx emissions decline by 88\% (87\%, 89\%) and by 20\%. (17\%, 22\%) two months following installation, respectively.

